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Date: I0/«g^o/g0 Express Mail Label No. £.^5535 '^9.'^^^ US 



Inventors: Thomas J. Hudson, James Engert and Andrea Richter 

Attorney's Docket No,: 2825.1021-003 

IDENTIFICATION OF ARSACS MUTATIONS AND METHODS OF USE 

THEREFOR 

RELATED APPLICATIONS 

This apphcation claims the benefit of U.S. Provisional application Serial No. 
5 60/160,588, filed October 20, 1999, the entire teachings of which are incorporated 
herein by reference. 

BACKGROUND OF THE INVENTION 

Autosomal recessive spastic ataxia of Charlevoix-Saguenay (ARSACS) is an 
early-onset neurodegenerative disease with high prevalence in the Charlevoix- 

1 0 Saguenay-Lac-Saint-Jean (CSLS J) region of Quebec. Disease progression is rapid 
through young adulthood, with most patients requiring wheelchairs by their early 
forties. The disease is characterized by abohshed sensory nerve conduction, reduced 
motor nerve velocity, and a unique clinical feature of hypermyeUnation of retinal nerve 
fibers. Additional pathological features include atrophy of the upper cerebellar vermis, 

15 absence of Purkinje cells, and possibly abnormal neuronal lipid storage (Bouchard, J-P., 
In: Handbook of Clinical Neurology 16: Hereditary neuropathies and spinocerebellar 
degenerations, J.M.B.V, de Jong, Ed., pp. 451-459, Elsevier Science Publishers, 
Amsterdam (1991)). A developmental defect in the myelination of both retinal and 
peripheral nerve fibers has been proposed as the physiological basis of the disease 

20 (Bouchard, J-P., et al. Neuromuscular Disorders 8:474-479 (1998)), More than 300 
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patients have been identified, and the estimated carrier frequency is 1 in 22 in the 
Charlevoix-Saguenay-Lac-Saint-Jean (CSLSJ) population of northeastern Quebec (3). 

SUMMARY OF THE INVENTION 

As described herein, the ARSACS gene, referred to herein as "spastin'' (also 

5 known as sacsin), has been mapped to chromosome 13ql 1 by linkage analysis and 
cloned from human, mouse and hamster. The gene was identified by using fine- 
structure linkage disequilibrium (LD) mapping to narrow the disease interval and then 
performing sample-sequencing to identify candidate genes. The spastin gene has a 
remarkable feature in that it contains a large exon spanning at least 12,794 base pairs of 

] 0 genomic DNA and comprises an open-reading frame of 1 1 ,487 base pairs. As described 
herein the gene is highly conserved in mouse. This exon of spastin is the largest found 
in any vertebrate organism. The deduced protein contains three large domains with 
sequence similarity to each other, as well as to the protein predicted to be encoded by an 
open reading frame identified in Arabidopsis genomic DNA. These domains contain a 

1 5 subdomain with sequence similarity to heat-shock proteins, suggesting a role in 

chaperone-mediated protein folding. Spastin appears to be expressed in a wide variety 
of tissues including brain and central nervous system. Alterations in the spastin gene 
have been identified as described herein which correlate strongly with ARSACS, 
including at least two alterations which have severe effects on the encoded protein, 

20 providing strong evidence that mutations in the open reading frame of the spastin gene 
are responsible for ARSACS. 

The present invention relates to an isolated nucleic acid molecule comprising a 
spastin gene or portion of said gene as described herein. In one embodiment, the 
invention relates to an isolated nucleic acid molecule comprising a nucleotide sequence 

25 selected from the group consisting of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15 and 
the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15. In another 
embodiment the invention relates to an isolated nucleic acid molecule comprising an 
exon from a vertebrate gene wherein said exon is at least 1 150 base pairs in length. The 
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invention also relates to an isolated nucleic acid molecule consisting of a nucleotide 
sequence selected from the group consisting of SEQ E) NOS: 1, 3, 7, 9, 11, 12, 13, 14 
and 15 and the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15. In a 
preferred embodiment the genes of the invention are human genes. The invention also 
5 relates to an isolated nucleic acid molecule consisting of a nucleotide sequence selected 
from the group consisting of SEQ ID NOS: 21-66 and the complement of SEQ ID NOS: 
21-66. 

The present invention also includes fragments of the spastin genes described 
herein. For example, the invention relates to an isolated portion of a nucleic acid 

10 sequence selected from the group consisting of SEQ ED NOS: 1, 3, 7, 9, 11, 12, 13, 14 
and 15 and the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15, wherein 
the portion is at least about 10 nucleotides in length. 

The invention also relates to nucleic acid molecules having substantial sequence 
identity to the specific sequences disclosed herein. In one embodiment, the invention 

1 5 relates to a nucleic acid molecule comprising a nucleotide sequence which is at least 
about 60% identical to a nucleotide sequence selected from the group consisting of SEQ 
ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15 and the complement of SEQ ID NOS: 1, 3, 7, 
9, 1 1, 12, 13, 14 and 15. In another embodiment, the invention relates to a nucleic acid 
molecule which hybridizes under high stringency conditions to a nucleotide sequence 

20 selected from the group consisting of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15 and 
the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15. 

The nucleic acid molecules of the present invention, or portions thereof, can be 
used as probes to isolate and/or clone substantially similar or functionally equivalent 
homologues of the spastin family of genes. The polynucleotides of the present 

25 invention can also be used as probes to detect and or measure expression of the genes 
encoded by the present invention. The probes of the present invention can be DNA, 
RNA or PNA. Expression assays, such as Southem blot analysis and whole mount in 
situ hybridization, are well known in the art. The polynucleotides of the present 
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invention, or portions thereof, can also be used as primers to clone homologues or 
family members by PGR using techniques well known in the art. 

The invention further relates to nucleic acid constructs comprising the isolated 
nucleic acid molecules of the invention, as well as to a recombinant host cell comprising 
5 the isolated nucleic acid molecules of the invention. The invention further relates to a 
method for preparing a polypeptide encoded by an isolated nucleic acid molecule of the 
invention, comprising culturing the recombinant host cells of the invention. 

Also encompassed by the present invention are isolated polypeptides encoded by 
nucleic acid molecules described herein. For example, the invention relates to an 

10 isolated polypeptide comprising an amino acid sequence selected from the group 

consisting of SEQ ED NOS: 2, 4, 8, 10, 16 and 67-69. The invention also relates to an 
isolated polypeptide comprising an amino acid sequence having greater than 75 % 
identity to an amino acid sequence selected from the group consisting of SEQ ED NOS: 
2, 4, 8, 10, 16 and 67-69. The invention also provides antibodies, and antigen binding 

1 5 fragments thereof, to the polypeptides of the invention, particularly antibodies and 
antigen binding fragments thereof which specifically bind the polypeptides described 
herein. 

The invention also provides a method for assaying the presence of a nucleic acid 
molecule in a sample, comprising contacting said sample with a nucleotide sequence 

20 selected from the group consisting of SEQ ED NOS: 1, 3, 7, 9, 11, 12, 13, 14, 15, 17-66, 
72 and 73; the complement of SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14, 15, 17-66, 72 and 
73; a portion of any one of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14, 15, 17-66, 72 and 73 
which is at least 10 nucleotides in length; and a portion of the complement of any one of 
SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14, 15, 17-66, 72 and 73 which is at least 10 

25 nucleotides in length, under conditions appropriate for selective hybridization of the 
sequence to the nucleic acid molecule in the sample. Presence or absence of a 
hybridization signal indicates presence or absence, respectively, of the target nucleic 
acid molecule. The invention also relates to a method for assaying the presence of a 
polypeptide encoded by an isolated nucleic acid molecule of the invention in a sample. 
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comprising contacting said sample with an antibody which specifically binds to the 
encoded polypeptide. 

The invention further relates to a method of diagnosing or aiding in the 
diagnosis of neurodegenerative disease in an individual comprising obtaining a nucleic 
5 acid sample from the individual and determining the nucleotide present at nucleotide 
position 5254 of SEQ ID NO: 1, wherein presence of a thymine at said position is 
indicative of increased likelihood of neurodegenerative disease in the individual as 
compared with an appropriate control, e.g., an individual having a cytosine at said 
position. The invention also relates to a method of diagnosing or aiding in the diagnosis 

1 0 of neurodegenerative disease in an individual comprising obtaining a nucleic acid 

sample from the individual and determining whether there is a deletion of a thymine at 
nucleotide position 6594 of SEQ ID NO: 1, wherein deletion of a thymine at said 
position is indicative of increased likelihood of neurodegenerative disease in the 
individual as compared with an appropriate control, e.g., an individual who does not 

1 5 have a deletion at said position. 

The invention also relates to a method of treating a neurodegenerative disorder 
associated with the presence of a thymine at nucleotide position 5254 of SEQ ID NO: 1 
in an individual, comprising administering to the individual an agent selected from the 
group consisting of a polypeptide encoded by SEQ ID NO: 2 or an active portion 

20 thereof, a nucleic acid molecule which encodes SEQ ID NO: 2 or an active portion of 
SEQ ID NO: 2, and an agonist of SEQ ID NO: 2. The invention fiuther relates to a 
method of treating a neurodegenerative disorder associated with a deletion at nucleotide 
position 6594 of SEQ ID NO: 1 in an individual, comprising administering to the 
individual an agent selected from the group consisting of a polypeptide encoded by SEQ 

25 ID NO: 2 or an active portion thereof, a nucleic acid molecule which encodes SEQ ID 
NO: 2 or an active portion of SEQ ID NO: 2, and an agonist of SEQ ID NO: 2, 
The invention also encompasses a method of diagnosing or aiding in the 
diagnosis of neurodegenerative disease associated with the presence of a thymine at 
nucleotide position 5254 of SEQ ID NO: 1 in an individual, comprising obtaining a 
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sample comprising a Spastin polypeptide from the individual and determining the size 
of the Spastin polypeptide, wherein if the Spastin polypeptide is significantly shorter 
than SEQ ID NO: 2 it is indicative of neurodegenerative disease. The invention also 
provides a method of diagnosing or aiding in the diagnosis of neurodegenerative disease 
5 associated with the presence of a deletion at nucleotide position 6594 of SEQ ID NO: 1 
in an individual, comprising obtaining a sample comprising a Spastin polypeptide from 
the individual and determining the size of the Spastin polypeptide, wherein if the 
Spastin polypeptide is significantly shorter than SEQ ID NO: 2 it is indicative of 
neurodegenerative disease. In one embodiment, the Spastin polypeptide is significantly 

1 0 shorter than SEQ ID NO: 2 if the Spastin polypeptide comprises less than about 75% of 
the amino acids of SEQ ID NO: 2. 

In one embodiment, the neurodegenerative disease comprises one or more 
symptoms selected from the group consisting of: reduced sensory nerve conduction, 
reduced motor nerve velocity, hypermyelination of retinal nerve fibers, atrophy of upper 

1 5 cerebellar vermis, absence of Purkinje cells and abnormal neuronal lipid storage. In a 
particular embodiment, the nucleic acid sample is obtained from a tissue selected from 
the group consisting of: brain tissue, CNS, lung, fetal lung, testis, lymphocytes, adipose, 
fibroblasts, skeletal muscle, pancreas, uterus, kidney, tonsil, embryo and isolated cells 
thereof For example, brain tissue can be selected from the group consisting of cerebral 

20 cortex, granular cell layer of the cerebellum and hippocampus. In a particular 

embodiment, the neurodegenerative disease is an early onset neurodegenerative disease. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic diagram of the structure and organization of the spastin 
25 gene. Markers used for the genetic map of the spastin gene are shown above. SGCG is 
the sarcoglycan, gamma gene. hCIT 26_L_1 and hCIT 235_L_20, the overiapping 
clones that contain the spastin ORE, are 1 10 kilobases (kb) and 60 kb, respectively. 
Exploded view shows the location of the spastin gene. The thick bar is the predicted 
coding region. The thin bars represent the 5' and 3' UTRs. M is the first methionine. S 



is the location fo the introduced stop codon found on the minor haplotype. A indicates 
the location of the deleted base pair found on the major haplotype. ABO 18273 is the 
mRNA sequence KIAA0730 (42) which is part of a UniGene cluster (Hs. 159492) 
containing 32 ESTs. R17106, AA776169, AA776670, and AA897178 are additional 
ESTs with homology to the spastin gene. 

Figures 2A-2B show the results of sequence analysis and identification of 
spastin mutations found on ARSACS chromosomes. The sequences displayed are from 
direct sequencing of PGR products and flank the two mutations (indicated by arrows) 
found on ARSAGS chromosomes. Nucleotide numbering is from the putative initiation 
codon. Figure 2A shows nucleotide 6594 (codon 2198) for an unaffected individual 
(top panel) and a homozygous affected individual (bottom panel). Figure 2B shows 
nucleotide 5254 (codon 1752) for an unaffected individual (top panel) and an affected 
compound heterozygous individual (bottom panel). 

Figure 3 shows a Northern blot analysis of spastin mRNA. A ^^P-labelled 1.8 kb 
cDNA fragment from the 3' end of the spastin gene (Image clone #279258) was 
hybridized to a blot of fibroblast RNA and to a multiple tissue blot (MTN, Glontech). 
Lanes 1-5 contain patient fibroblast RNAs and lane 6 contains control fibroblast RNA. 
The lanes of the MTN blot correspond to the following tissues: 1, heart; 8, brain; 9, 
placenta; 10, lung; 11, liver; 12, skeletal muscle; 13, kidney; and 14, pancreas. The 
marker (M) is the 0.24-9.5 kb RNA ladder (Life Technologies). 

Figures 4A-4B are schematic representations of the Spastin protein and relevant 
homologies. Figure 4 A shows a schematic representation of the Spastin protein and 
location of motifs, rep. 1, 2, and 3 represent the domains with homology (28, 30 and 
21% identity, respectively) to the Arabidopsis open reading frame. Figure 4B shows 
homology between the two Hsp90 domains of Spastin, the first mouse domain, the 
Arabidopsis open reading frame (GenBank accession #AB006708), and the yeast Hsp90 
(GenBank accession #3401 959), Alignment was performed with ClustalW (1 .7)(43) 
through the BCM Search Launcher interface (34) with the BLOSUM weight matrix. 
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The numbering for all sequences is from the first methionine (nucleotide 50,773 is the 
first methionine of the Arabidopsis open reading frame). 

Figures 5A-5C show the alignment of the human Spastin with the mouse 
Spastin. Identical amino acids and gaps are represented by dots and hyphens, 
5 respectively. Light gray shading denotes the self-homologous region containing the 
Hsp90 homology, dark gray shading highUghts the DnaJ region. The boxed sequences 
represent leucine zipper motifs, underlined sequences represent coiled coil domains, and 
the boxed and underlined sequence delineates the putative hydrophihc region. The furst 
coiled coil domain is interrupted by a prohne in the mouse sequence. 
1 0 Figure 6 is a table showing ESTs identified by sample-sequencing of the 

ARSACS critical interval. 

Figure 7 is a table showing primers for PCR amplification of the human spastin 

gene. 

Figures 8A-8G show the complete exon (SEQ ID NO: 3) of the murine spastin 
1 5 gene as shown in Figures 8A-8G. 

Figures 9A-9F show the complete exon (SEQ ID NO: 1) of the human spastin 

gene. 

DETAILED DESCRIPTION OF THE EWENTION 

The gene responsible for ARSACS was mapped to chromosome region 13ql 1 

20 by genotyping 322 microsatellite markers in a genome-wide scan and noting a high 
degree of homozygosity at locus D13S787 (Bouchard, J-P., et al, Neuromuscular 
Disorders 8:474-479 (1998)). Extensive genetic analysis of the region defmed a 
maximum multi-point LOD score of 42.3 and revealed a major conserved haplotype 
among ARSACS chromosomes in a 1 1 . 1 cM region flanked by D13S1236 and 

25 Dl 3S1 285 (5). Two groups of ARSACS haplotypes were found between D13S1275 
and DI3S292. The overwhehning majority (96%) of ARSACS chromosomes carried a 
single haplotype, defmed by D13S232 and two single nucleotide polymorphisms (SNPs) 
within the sarcoglycan, gamma gene (SGCG). Location score analysis demonstrated 
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that the most likely position of the ARSACS was between D13S232 and D13S292 (the 
critical interval)(5). 

A high-resolution physical and transcript map of the ARSACS critical interval 
was constructed in yeast artificial chromosomes (YACs), bacterial artificial 
5 chromosomes (BACs) and plasmid artificial chromosomes (PACs). The identification 
of the ARSACS gene (i.e., a gene in which alteration is associated with ARSACS) was 
carried out as described herein by performing sample-sequencing of six BAC and PAC 
clones spanning about 450 kilobases (kb) included in the critical interval. Analysis of 
the sample sequences revealed human ESTs (Figure 6) and the presence of two known 

1 0 genes: sodium/potassium- ATPase (ATPIALI), that was excluded on the basis of 

recombination in ARSACS famiUes, and SGCG, a gene in which no sequence variants 
unique to ARSACS chromosomes were found. 

A 20 kb sequence contig revealed a huge genomic open reading fi-ame (ORF) of 
1 1,487 base pairs that encodes 3829 amino acids (SEQ ID NO: 2). The open reading 

1 5 fi-ame (ORF) begins with an AUG codon preceded by an in-fi-ame stop codon 75 bp 
upstream and continues for a total of 3,829 codons before encountering a stop codon. 
One large cDNA (KIAA0730) derived from a brain library and over 30 ESTs overiap 
the ORF and allowed the determination of the 3' untranslated region (UTR), which 
extends 1,307 bp to a polyadenylation site (Figure 1). The existence of this gigantic 

20 exon was confirmed by analyzing RT-PCR products spanning the entire mRNA; this 
analysis showed perfect correspondence between the mRNA and genomic DNA 
sequence. Thus, the total length of the exon must be at least 12,794 bp. A probe 
derived from within this sequence detects a transcript of approximately 12.8 kb on a 
Northern blot, suggesting that the identified exonic sequence may constitute an 

25 intronless gene, although the possibility of a small 5' exon cannot be excluded. 

To characterize the full sequence of the ORF and to identify potential disease- 
causing mutations, PGR products from ARSACS patient and control DNA were 
sequenced. The primers for these reactions are shown in Figure 7. A single-base 
deletion of a thymine at position 6594 (6594AT) (Figure 2A) was found on all copies of 
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the major ancestral haplotype examined (a total of 32 chromosomes), but was absent in 
all chromosomes of carrier parents that were not transmitted to ARSACS offspring. 
This mutation causes a frame shift and results in a subsequent stop codon that truncates 
the final 43% of the predicted protein. A second mutation, a nonsense mutation of 
5 substitution of a thymine for a cytosine at nucleotide position 5254 (c5254T) (Figure 
2B) results in the substitution of a stop codon for an arginine and was found on the 
minor ARSACS haplotype carried in a heterozygous state (in trans to the major 
ARSACS mutation) in six patients from two famihes (5). Both mutations are thus 
completely associated with their respective core haplotypes and are predicted to have 

1 0 severe effects on the encoded protein. The presence of these two mutations provides 
sfrong evidence that mutations in this ORF are responsible for ARSACS. The gene is 
referred to herein as spastin (gene symbol: SPAS). 

In the course of the complete resequencing of the spastin gene in ARSACS 
patients, additional sequence variants were found which proved to be polymorphisms 

1 5 found on non-ARSACS-bearing chromosomes as well. These included four silent 
substitutions: substitution of a thymine for a cytosine at nucleotide position 3945, 
substitution of a cytosine for a thymine at nucleotide position 6603, substitution of a 
thymine for a cytosine at nucleotide position 7731, and substitution of a thymine for a 
cytosine at nucleotide position 10054 (C3945T, T6603C, C7731T and C10054T, 

20 respectively), and an amino acid-altering substitution of a cytosine for a thymine at 
nucleotide position 7856 (T7856C) that results in the substitution of an alanine for a 
valine in the predicted protein. 

Spastin mRNA was detected by northern blot analysis in fibroblasts, brain and 
skeletal muscle (Figure 3, lanes 1-6, 8 and 12) and at very low levels in pancreas 

25 (Figure 3, lane 14). A single transcript of roughly 12.8 kb was seen in all cases. Spastin 
mRNA was expressed in the fibroblasts of ARSACS patients (Figure 3, lanes 1-5) at the 
same size as controls, which is not unexpected because both mutations alter only a 
single nucleotide. 
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To examine the tissue expression pattern of spastin more closely, in situ 
hybridizations were performed. Human, monkey, and rat brain all demonstrated high 
levels of staining, which included all layers of the cerebral cortex and the granular cell 
layer of the cerebellum. In a sagittal section of the adult rat brain, strong labeling was 
5 seen in most if not all areas of the central nervous system (CNS). Particularly intense 
labeling was observed on the hippocampus. No labeling is seen with the sense probe, 
hi addition, specific staining of spastin mRNA was seen throughout the CNS of the 18- 
19 day fetal rat. Background staining with the sense probe does not include the CNS. 
Spastin ESTs were identified from the cDNA libraries of many tissues including brain, 
1 0 uterus, kidney, tonsils, liver, and T cells. Transcripts from brain and multiple sclerosis 
hbraries comprise 13 of the 35 human ESTs with homology to spastin. Taken together, 
^ these hues of evidence indicate that spastin is expressed in a variety of tissues, 

ijj including many that are neural-derived. 

On the basis of its amino acid sequence, the Spastin protem product is predicted 
Ul 15 to have a molecular weight of437kD and a pi of 6.85. Structure prediction programs 
suggest the presence of two leucine zippers, three coiled coils and a hydrophihc domain, 
. all within the C-terminal half of the protein (Figures 4A and 5 A-5C). The predicted 

CI protein product does not show extensive similarity to any known protein, based on 

g analyses using a variety of different sequence comparison tools. However, two related 

20 motifs were identified. The C-terminal portion of the predicted protein contains a 'DnaJ' 
protein motif (Figures 4A and 5A-5C, residues 3574-3590). Both human and mouse 
proteins also contain three large segments with sequence similarity to each other, of 
which two have homology to the N-terminal domain of the Hsp90 class of heat-shock 
proteins from a variety of organisms. These Hsp90 subdomains are found in spastin 
25 residues 705-833 and 1773-1895 (Figures 4A and 5A-5C). As discussed below, the 
DnaJ and Hsp90 protein classes are both involved in molecular chaperone complexes. 
Interestingly, the three large segments also show strong similarity to a BAC clone 
recently sequenced as part of the Arabidopsis genome project (GenBank #AB006708). 
Specifically, they are homologous to a portion of a 5,87 Ibp ORF of unknown fimction 
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in Arabidopsis (Figure 5 A). An alignment of the Hsp90 domain is shown for the first 
and second large segments from human, the first segment for mouse, Xhe Arabidopsis 
ORF and the yeast Hsp90 (Figure 5B). The highly conserved residues correspond to 
regions already identified as highly conserved "signature sequences" in an extensive 
5 phylogenetic analysis of the Hsp90 family (9). Molecular chaperones are known to 
function in multiple sub-cellular compartments. A knowledge-based program for 
predicted subcellular localization, PSORT II (10) favored a nuclear locahzation for the 
Spastin protein, but the prediction score was relatively weak (47%). 

As provided herein, the spastin gene is also conserved in mouse. Homologous 
1 0 mouse ESTs were identified, including one having a polyadenylation signal. Using 
Q these ESTs to screen a mouse BAC library (CitbCJ7), the mouse spastin gene was 

□1 isolated, identified and sequenced. Sequence analysis of the mouse spastin genomic 

= ^ clone revealed the presence of a huge ORF, which is three nucleotides longer than the 

human homologue and thus, the mouse Spastin protein is predicted to be one ^no acid 
yi 15 longer. The entire ORF is well conserved between mouse and human, both at the DNA 

level (88% homology) and at the protein level (94% identity, 97% similarity). The 
u areas of high sequence conservation between mouse and human inchided the two 

i=| leucine zippers, two coiled coils, the Hsp90 and DnaJ domains, and the repeated 

^zj Arabidopsis ORF homology (Figures 5A-5C). The 3' UTRs show greater divergence 

20 between the mouse and human, but still retain 72% homology. The mouse spastin gene 
was mapped to chromosome 1, near DlMit373, on the basis of radiation hybrid 
mapping (LOD score of 25.5) using the Whitehead Listitiite mouse T31 RH fi-amework 
(11,12). 

Work described herein strongly supports that a frameshift and a nonsense 
25 mutation identified within the spastin gene cause ARSACS. Though the gene appears 
to be widely expressed, the tiimcation of the Spastin protein caused either by 
homozygous (6594AT/6594AT) or compound heterozygous (C5254T/6594AT) 
genotypes apparently lead to symptoms predominantly affecting the nervous system. 
The high level of expression of spastin mRNA in the granular cell layer of the adult rat 
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cerebellum is especially interesting in light of an earlier observation of the reduced 
thickness of the granular layer found during the postmortem examination of tissue from 
an ARSACS patient (Bouchard, J-P., In: Handbook of Clinical Neurology 16: 
hereditary neuropathies and spinocerebellar degenerations, pp.45 1-459, Elsevier 
5 Science PubUshers, Amsterdam (1991)). Thus, the high mRNA expression levels seen 
in the CNS indicate a possibly unique role for Spastin in the genesis or maintenance of 
neural cell function. 

As described herein, sample-sequencing of the ARSACS critical region, in 
combination with directed sequencing of specific subclones and computer-aided 

] 0 analysis led to the characterization of a very large exon directly from genomic DNA. 
This likely represents the entire coding sequence of the spastin gene as the first 
methionine is preceded by an in-frame stop codon 75 bp upstream. RT-PCR 
demonstrated that the sequence, from this 75 bp until the polyadenylation site, is 
transcribed. Spastin appears to be an intronless gene, although a non-coding upstream 

1 5 exon cannot be ruled out. The spastin exon of at least 12,794 bp encoding an ORF of 
1 1,487 bp represents the largest exon and the largest ORF within an exon found in any 
vertebrate so far. The next largest exons reported are the X (inactive)-specific transcript 
{XIST) (1 1,363 bp) which does not code for a protein (13), and the large central exon of 
the mucin gene {MUC5B) which is 10,713 bp long (14). 

20 Intronless ORFs are uncommon and thought to represent at most 5% of human 

genes. A few gene famiUes are frequently intronless, including histones and G-protein- 
coupled receptors (GPCRs) (15). Members of the Hsp70 family, but not the Hsp90, are 
also intronless. The strong conservation between both the human and the mouse spastin 
and the unusually large 5,871 bp Arabidopsis ORF suggest both that spastin is ancient 

25 and that the large size of the exon is ftmctionally important. 

The presence of similarities to DnaJ and Hsp90 proteins sheds light on spastin' s 
potential fimction. Examples of interacting protein pairs having homologues of the two 
proteins fused into a single protein are well known in the art (17). Spastin possesses 
both the N-terminal domain of the Hsp90 protein class and a DnaJ domain. These two 
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domains are from proteins that interact in chaperone-mediated protein folding. The 
DnaJ motif has long been known to form heterocomplexes with the Hsp70 class of 
proteins in a variety of cellular processes, including ATP-dependent folding of target 
proteins. The N-terminal domain of the Hsp90 protein class contains an ATP-binding 
5 site that is very similar to the one found in DNA gyrase B (1 8). More recently, it has 
been shown that the yeast DNAJ homologue, YDJl, physically associates with Hsp90 
and this interaction has specific effects on Hsp90 substrates (19). In addition, other 
studies have shown that a rabbit DnaJ homologue (p40) interacts with Hsp70 and Hsp90 
(both molecular chaperones) to form heterocomplexes known as "foldosomes" (20). 
10 Together, these data suggest that spastin functions in chaperone-mediated protein 
folding. 

As described herein the mouse spastin gene was mapped to chromosome 1 near 
DlMit373. A recessive mouse mutation known as tumbler {tb\ MGI Accession 
ID:98489) was previously mapped to this region by linkage (21), Tumbler mice had an 

1 5 ataxia that caused them to "walk in a crab-hke fashion." They somersaulted, fell over, 
or jumped when trying to go forward. Most of the homozygotes survived and bred (21). 
These observations are similar to those seen in ARSACS patients whose life 
expectancy, although reduced (mean age at death is 51 years) still permits some to 
survive until the eighth decade. The fertihty of affected females seems unchanged, but 

20 because overall nuptiality is low, male fertility has been difficult to assess (Bouchard, J- 
P., et al, Nueromuscular Disorders 8:474-479 (1998)). Unfortunately, the tb mouse 
line has died out (Mouse Genome Database: URL:http://www.informatics.jax.org/). 
However, gene knock-out of the mouse spastin gene could serve to confirm that the tb 
mutation was a mutation in the mouse spastin gene. 

25 SEQ ID NOS: referred to herein are as follows. SEQ ID NO: 1 refers to the 

complete exon of the human spastin gene as shown in Figures 9A-9F. SEQ ED NO: 2 
refers to the protein encoded by the ORE of SEQ ED NO: 1, particularly as shown in 
Figures 9A-9F and 5A-5C. SEQ ID NO: 3 refers to the complete exon of the murine 
spastin gene as shown in Figures 8A-8G. SEQ ID NO: 4 refers to the protein encoded 



2825,1021-003 



-15- 

by the ORF of SEQ ID NO: 3, particularly as shown in Figures 9A-9F and 5A-5C, SEQ 
ID NOS: 5 and 6 are intentionally omitted. SEQ ID NO: 7 refers to a nucleotide 
sequence which is identical to SEQ ID NO: 1 except for a deletion of a thymine at 
position 6594. SEQ ID NO: 8 refers to the protein encoded by the ORF of SEQ ID NO: 
5 7. SEQ ID NO: 9 refers to a nucleotide sequence which is identical to SEQ ID NO: 1 
except for a substitution of a thymine for a cytosine at position 5254. SEQ ID NO: 10 
refers to the protein encoded by the ORF of SEQ ID NO: 9. SEQ ID NO: 11, 12, 13 
and 14 refer to nucleotide sequences which are identical to SEQ ID NO: 1 except for a 
substitution of a thymine for a cytosine at position 3945, substitution of a cytosine for a 
1 0 thymine at position 6603, substitution of a thymine for a cytosine at position 773 1, and 
substitution of a thymine for a cytosine at position 10054, respectively. SEQ ID NO: 15 
refers to a nucleotide sequence which is identical to SEQ ID NO: 1 except for 
substitution of a cytosine for a thymine at position 7856. SEQ ID NO: 16 refers to the 
protein encoded by the ORF of SEQ ID NO: 15. The sequences corresponding to all 
1 5 other SEQ ID NOS : used herein are shown throughout the application. 

As appropriate, the isolated nucleic acid molecules of the present invention can 
be RNA, for example, mRNA, or DNA, such as cDNA and genomic DNA. DNA 
molecules can be double-stranded or single-stranded; single stranded RNA or DNA can 
be either the coding, or sense, strand or the non-coding, or antisense, strand. The nucleic 
20 acid molecule can include all or a portion of the coding sequence of a gene and can 

further comprise additional non-coding sequences such as introns and non-coding 3' and 
5' sequences (including regulatory sequences, for example). Additionally, the nucleic 
acid molecule can be fused to a marker sequence, for example, a sequence that encodes 
a polypeptide to assist in isolation or purification of the polypeptide. Such sequences 
25 include, but are not limited to, those which encode a glutathione-S-transferase (GST) 
fusion protein and those which encode a hemaglutin A (HA) polypeptide marker from 
influenza. 

As used herein, "isolated" is intended to mean that the isolated item is not in the 
form or environment in which it exists in nature. For example, an "isolated" nucleic 
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acid molecule, as used herein, is one that is separated from nucleic acid which normally 
flanks the nucleic acid molecule in nature. With regard to genomic DNA, the term 
"isolated" refers to nucleic acid molecules which are separated from the chromosome 
with which the genomic DNA is naturally associated. For example, the isolated nucleic 
5 acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb 
of nucleotides which flank the nucleic acid molecule in the genomic DNA of the cell 
from which the nucleic acid is derived. 

Moreover, an isolated nucleic acid of the invention, such as a cDNA or RNA 
molecule, can be substantially free of other cellular material, or culture medium when 

10 produced by recombinant techniques, or chemical precursors or other chemicals when 
chemically synthesized. However, the nucleic acid molecule can be fixsed to other 
coding or regulatory sequences and still be considered isolated. In some instances, the 
isolated material will form part of a composition (for example, a crude extract 
containing other substances), buffer system or reagent mix. In other circumstances, the 

1 5 material may be purified to essential homogeneity, for example as determined by PAGE 
or column chromatography such as HPLC. Preferably, an isolated nucleic acid 
comprises at least about 50, 80 or 90% (on a molar basis) of all macromolecular species 
present. 

Further, recombinant DNA contained in a vector is included in the definition of 
20 "isolated" as used herein. Also, isolated nucleic acid molecules include recombinant 
DNA molecules in heterologous host cells, as well as partially or substantially purified 
DNA molecules in solution. "Isolated" nucleic acid molecules also encompass in vivo 
and in vitro RNA transcripts of the DNA molecules of the present invention. 

The invention fiirther provides variants of the isolated nucleic acid molecules of 
25 the invention. Such variants can be naturally occiraing, such as alleUc variants (same 
locus), homologs (different locus), and orthologs (different organism), or may be 
constructed by recombinant DNA methods or by chemical synthesis. Such 
non-naturally occurring variants can be made using well-known mutagenesis 
techniques, including those appUed to polynucleotides, cells, or organisms. 
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Accordingly, variants can contain nucleotide substitutions, deletions, inversions and/or 
insertions in either or both the coding and non-coding region of the nucleic acid 
molecule. Further, the variations can produce both conservative and non-conservative 
amino acid substitutions. 
5 Typically, variants have a substantial identity with a nucleic acid molecule 

disclosed herein and the complements thereof. Particularly preferred are nucleic acid 
molecules and fragments which have at least about 60%, preferably at least about 70, 80 
or 85%, more preferably at least about 90%, even more preferably at least about 95%, 
and most preferably at least about 98% identity with nucleic acid molecules described 
1 0 herein. 

Such nucleic acid molecules can be readily identified as being able to hybridize 
under stringent conditions to a nucleotide sequence selected from the group consisting 
of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14, 15, 17-66, 72 and 73 and the complements 
thereof In one embodiment, the variants hybridize under high stringency hybridization 

1 5 conditions {e.g., for selective hybridization) to a nucleotide sequence selected from 
SEQ ID N0S:1, 3, 7, 9, 11, 12, 13, 14, 15, 17-66, 72 and 73 . 

A general description of stringent hybridization conditions are discussed in 
Ausubel, F.M., et al, Current Protocols in Molecular Biology, Greene PubKshing 
Assoc. and Wiley-Interscience 1989, the teachings of which are incorporated herein by 

20 reference. Factors such as probe length, base composition, percent mismatch between 
the hybridizing sequences, temperature and ionic strength influence the stability of 
nucleic acid hybrids. Thus, stringency conditions sufficient to identify the 
polynucleotides of the present invention, (e.g., high or moderate stringency conditions) 
can be determined empirically, depending in part upon the characteristics of the known 

25 DNA to which other unknown nucleic acids are being compared for sequence similarity. 
Equivalent conditions can be determined by varying one or more of these parameters 
while maintaining a similar degree of identity or similarity between the two nucleic acid 
molecules. Typically, conditions are used such that sequences at least about 60%, at 
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least about 70%, at least about 80%, at least about 90% or at least about 95% or more 
identical to each other remain hybridized to one another. 

Alternatively, conditions for stringency are as described in WO 98/40404, the 
teachings of which are incorporated herein by reference. In particular, examples of 
5 highly stringent, stringent, reduced and least stringent conditions are provided in WO 
98/40404 in the Table on page 36. In one embodiment, highly stringent conditions are 
those that are at least as stringent as, for example, Ix SSC at 65'C, or Ix SSC and 50% 
formamide at 42°C. Moderate stringency conditions are those that are at least as 
stringent as 4x SSC at 65T, or 4x SSC and 50% formamide at 42T. Reduced 

] 0 stringency conditions are those that are at least as stringent as 4x SSC at 50'C, or 6x SSC 
and 50% formamide at 40'C. 

The percent identity of two nucleotide or amino acid sequences can be 
determined by aligning the sequences for optimal comparison purposes (e.g,, gaps can be 
introduced in the sequence of a first sequence). The nucleotides or amino acids at 

1 5 corresponding positions are then compared, and the percent identity between the two 
sequences is a function of the number of identical positions shared by the sequences (z.e., 
% identity = # of identical positions/total # of positions x 100), In certain embodiments, 
the length of a sequence aligned for comparison purposes is at least 30%o, preferably at 
least 40%, more preferably at least 60%, and even more preferably at least 70%, 80% or 

20 90% of the length of the reference sequence. The actual comparison of the two 
sequences can be accomplished by well-known methods, for example, using a 
mathematical algorithm. A preferred, non-limiting example of such a mathematical 
algorithm is described in Karlin et al, Proc. Natl Acad. Set USA, 90:5873-5877 (1993). 
Such an algorithm is incorporated into the NBLAST and XBLAST programs (version 

25 2.0) as described in Altschul et al. Nucleic Acids Res., 25:389-3402 (1997). When 

utilizing BLAST and Gapped BLAST programs, the default parameters of the respective 
programs (e.g., NBLAST) can be used. See http://www.ncbi.nlm.nih.gov. In one 
embodiment, parameters for sequence comparison can be set at score^lOO, 
wordlength=12, or can be varied {e.g., W=5 or W=20). 
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The present invention also provides isolated nucleic acids that contain a fragment 
or portion that hybridizes under highly stringent conditions to a nucleotide sequence 
selected from the group consisting of SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14, 15, 17-66, 
72 and 73 described herein and the complements of these SEQ ID NOS. The nucleic 
5 acid fragments of the invention are at least about 1 5, preferably at least about 1 8, 20, 23 
or 25 nucleotides, and can be 30, 40, 50, 100, 200 or more nucleotides in length. Longer 
fragments, for example, 30 or more nucleotides in length, which encode antigenic 
proteins or polypeptides described herein are useftil. 

In a related aspect, the nucleic acid fragments of the invention are used as probes 

1 0 or primers in assays such as those described herein. "Probes" are oligonucleotides that 
hybridize in a base-specific manner to a complementary strand of nucleic acid. Such 
probes include polypeptide nucleic acids, as described in Nielsen et al., Science, 254, 
1497-1500 (1991). Typically, a probe comprises a region of nucleotide sequence that 
hybridizes under highly stringent conditions to at least about 15, typically about 20-25, 

1 5 and more typically about 40, 5 0 or 75 consecutive nucleotides of a nucleic acid molecule 
of the invention. More typically, the probe ftirther comprises a label, e.g., radioisotope, 
fluorescent compoimd, enzyme, or enzyme co-factor. 

As used herein, the term "primer" refers to a single-stranded oHgonucleotide 
which acts as a point of initiation of template-directed DNA synthesis using well-known 

20 methods {e.g. , PGR, LCR) including, but not limited to those described herein. The 
appropriate length of the primer depends on the particular use, but typically ranges from 
about 15 to 30 nucleotides. The term "primer site" refers to the area of the target DNA 
to which a primer hybridizes. The term "primer pair" refers to a set of primers including 
a 5' (upstream) primer that hybridizes with the 5' end of the nucleic acid sequence to be 

25 amplified and a 3' (downstream) primer that hybridizes with the complement of the 
sequence to be ampUfied. 

The nucleic acid molecules of the invention such as those described above can be 
identified and isolated using standard molecular biology techniques and the sequence 
information provided herein. For example, nucleic acid molecules can be amplified and 
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isolated by the polymerase chain reaction using synthetic oligonucleotide primers 
designed based on one or more of the sequences provided herein and the complements 
thereof. See generally PCR Technology: Principles and Applications for DNA 
Amplification (ed. H.A. Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A Guide 
5 to Methods and Applications (Eds. Innis, et al. , Academic Press, San Diego, CA, 1 990); 
Mattila et al., Nucleic Acids Res., 19:4967 (1991); Eckert et al., PCR Methods and 
Applications, 7:17 (1991); PCR (eds. McPherson et al., IRL Press, Oxford); and U.S. 
Patent 4,683,202. The nucleic acid molecules can be amphfied using cDNA, mRNA or 
genomic DNA as a template, cloned into an appropriate vector and characterized by 

1 0 DNA sequence analysis. 

Other suitable amplification methods include the ligase chain reaction (LCR) (see 
Wu and Wallace, Genomics, 4:560 (1989), Landegren et al.. Science, 241:1011 (1988), 
transcription amphfication (Kwoh et al, Proc. Natl. Acad. Sci. USA, 86:1113 (1989)), 
and self-sustained sequence rephcation (Guatelh et al., Proc. Nat. Acad. Sci. USA, 

15 57: 1 874 (1 990)) and nucleic acid based sequence amplification (NASBA). The latter 
two amplification methods involve isothermal reactions based on isothermal 
transcription, which produce both single stranded RNA (ssRNA) and double stranded 
DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, 
respectively. 

20 For example, the amplified DNA can be radiolabelled and used as a probe for 

screening a cDNA library derived fi-om fibroblast or brain, e.g., human fibroblast or 
brain, mRNA in zap express, ZIPLOX or other suitable vector. Corresponding clones 
can be isolated, DNA can obtained following in vivo excision, and the cloned insert can 
be sequenced in either or both orientations by art recognized methods to identify the 

25 correct reading fi-ame encoding a protein of the appropriate molecular weight. For 

example, the direct analysis of the nucleotide sequence of nucleic acid molecules of the 
present invention can be accompUshed using well-known methods that are commercially 
available. See, for example, Sambrook et al. Molecular Cloning A Laboratory Manual 
(2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory 
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Manual, (Acad. Press, 1988)). Using these or similar methods, the protein(s) and the 
DNA encoding the protein can be isolated, sequenced and further characterized. 

Antisense nucleic acids of the invention can be designed using the nucleotide 
sequences described herein, and constructed using chemical synthesis and enzymatic 
5 ligation reactions using procedures known in the art. For example, an antisense nucleic 
acid {e.g., an antisense oligonucleotide) can be chemically synthesized using naturally 
occurring nucleotides or variously modified nucleotides designed to increase the 
biological stability of the molecules or to increase the physical stability of the duplex 
formed between the antisense and sense nucleic acids, e.g., phosphorothioate derivatives 
1 0 and acridine substituted nucleotides can be used. 

In general, the isolated nucleic acid sequences can be used as molecular weight 
markers on Southern gels, and as chromosome markers which are labeled to map related 
gene positions. The nucleic acid sequences can also be used to compare with endogenous 
DNA sequences in patients to identify genetic disorders, and as probes, such as to 

1 5 hybridize and discover related DNA sequences or to subtract out known sequences fi-om 
a sample. The nucleic acid sequences can further be used to derive primers for genetic 
fingerprinting, to raise anti-protein antibodies using DNA immunization techniques, and 
as an antigen to raise anti-DNA antibodies or eHcit immune responses. Additionally, the 
nucleotide sequences of the invention can be used identify and express recombinant 

^0 proteins for analysis, characterization or therapeutic use, or as markers for tissues in 
which the corresponding protein is expressed, either constitutively, during tissue 
differentiation, or in diseased states. 

The invention also relates to constructs which comprise a vector into which a 
sequence of the invention has been inserted in a sense or antisense orientation. As used 

•5 herein, the term "vector" refers to a nucleic acid molecule capable of ti-ansporting another 
nucleic acid to which it has been linked. One type of vector is a "plasmid", which refers 
to a circular double sti-anded DNA loop into which additional DNA segments can be 
ligated. Another type of vector is a viral vector, wherein additional DNA segments can 
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be ligated into the viral genome. Certain vectors are capable of autonomous replication 
in a host cell into v^^hich they are introduced (e.g., bacterial vectors having a bacterial 
origin of repHcation and episomal mammalian vectors). Other vectors {e.g., 
non-episomal mammalian vectors) are integrated into the genome of a host cell upon 
5 introduction into the host cell, and thereby are replicated along with the host genome. 
Moreover, certain vectors, expression vectors, are capable of directing the expression of 
genes to which they are operably linked. In general, expression vectors of utility in 
recombinant DNA techniques are often in the form of plasmids (vectors). However, the 
invention is intended to include such other forms of expression vectors, such as viral 

1 0 vectors (e.g. , rephcation defective retroviruses, adenoviruses and adeno-associated 
viruses) that serve equivalent fimctions. 

Preferred recombinant expression vectors of the invention comprise a nucleic acid 
of the invention in a form suitable for expression of the nucleic acid in a host cell. This 
means that the recombinant expression vectors include one or more regulatory sequences, 

1 5 selected on the basis of the host cells to be used for expression, which is operably linked 
to the nucleic acid sequence to be expressed. Within a recombinant expression vector, 
"operably linked" is intended to mean that the nucleotide sequence of interest is linked to 
the regulatory sequence(s) in a manner which allows for expression of the nucleotide 
sequence (e.g., in an in vitro transcription/translation system or in a host cell when the 

20 vector is introduced into the host cell). The term "regulatory sequence" is intended to 
include promoters, enhancers and other expression control elements (e.g., 
polyadenylation signals). Such regulatory sequences are described, for example, in 
Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, 
San Diego, CA (1990). Regulatory sequences mclude those which direct constitutive 

25 expression of a nucleotide sequence in many types of host cell and those which direct 
expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific 
regulatory sequences). It will be appreciated by those skilled in the art that the design of 
the expression vector can depend on such factors as the choice of the host cell to be 
transformed, the level of expression of protein desired, etc. 
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The expression vectors of the invention can be introduced into host cells to 
thereby produce proteins or peptides, including fusion proteins or peptides, encoded by 
nucleic acids as described herein . The recombinant expression vectors of the invention 
can be designed for expression of a polypeptide of the invention in prokaryotic or 
5 eukaryotic cells, e.g,, bacterial cells such as E, coli, insect cells (using baculovirus 
expression vectors), yeast cells or mammahan cells. Suitable host cells are discussed 
further in Goeddel, supra. Alternatively, the recombinant expression vector can be 
transcribed and translated in vitro, for example using T7 promoter regulatory sequences 
and T7 polymerase. 

10 Another aspect of the invention pertains to host cells into which a recombinant 

expression vector of the invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. It is understood that such terms 
refer not only to the particular subject cell but also to the progeny or potential progeny of 
such a cell. Because certain modifications may occur in succeeding generations due to 

1 5 either mutation or environmental influences, such progeny may not, in fact, be identical 
to the parent cell, but are still included within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, a nucleic acid 
of the invention can be expressed in bacterial cells (e.g., E. coli), insect cells, yeast or 
mammalian cells (such as Chinese hamster ovary cells (CHO) or COS cells). Other 

20 suitable host cells are known to those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via 
conventional transformation or transfection techniques. As used herein, the terms 
"transformation" and "transfection" are intended to refer to a variety of art-recognized 
techniques for introducing foreign nucleic acid (e.g., DNA) into a host cell, including 

25 calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated 
transfection, lipofection, or electroporation. Suitable methods for transforming or 
transfecting host cells can be found in Sambrook, et al {supra), and other laboratory 
manuals. 
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A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can 
be used to produce (ie., express) a polypeptide of the invention. Accordingly, the 
invention further provides methods for producing a polypeptide using the host cells of the 
invention. In one embodiment, the method comprises culturing the host cell of invention 
5 (into which a recombinant expression vector encoding a polypeptide of the invention has 
been introduced) in a suitable medium such that the polypeptide is produced. In another 
embodiment, the method further comprises isolating the polypeptide from the medium or 
the host cell. 

The host cells of the invention can also be used to produce nonhuman transgenic 

10 animals. For example, in one embodiment, a host cell of the invention is a fertilized 
oocyte or an embryonic stem cell into which a nucleic acid of the invention have been 
introduced. Such host cells can then be used to create non-human transgenic animals in 
which exogenous nucleotide sequences have been introduced into their genome or 
homologous recombinant animals in which endogenous nucleotide sequences have been 

1 5 altered. Such animals are useful for studying the fimction and/or activity of the 

nucleotide sequence and polypeptide encoded by the sequence and for identifying and/or 
evaluating modulators of their activity. As used herein, a "transgenic animal" is a 
non-human animal, preferably a mammal, more preferably a rodent such as a rat or 
mouse, in which one or more of the cells of the animal includes a transgene. Other 

20 examples of transgenic animals include non-human primates, sheep, dogs, cows, goats, 
chickens, amphibians, etc. A transgene is exogenous DNA which is integrated into the 
genome of a cell from which a transgenic animal develops and which remains in the 
genome of the mature animal, thereby directing the expression of an encoded gene 
product in one or more cell types or tissues of the transgenic animal. As used herein, an 

25 "homologous recombinant animal" is a non-human animal, preferably a manmial, more 
preferably a mouse, in which an endogenous gene has been altered by homologous 
recombination between the endogenous gene and an exogenous DNA molecule 
introduced into a cell of the animal, e.g., an embryonic cell of the animal, prior to 
development of the animal. 
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A transgenic animal of the invention can be created by introducing a nucleic acid 
of the invention into the male pronuclei of a fertilized oocyte, e.g., by microinjection, 
retroviral infection, and allowing the oocyte to develop in a pseudopregnant female foster 
animal. The sequence can be introduced as a transgene into the genome of a non-human 
5 animal. Intronic sequences and polyadenylation signals can also be included in the 
transgene to increase the efficiency of expression of the transgene. A tissue-specific 
regulatory sequence(s) can be operably hnked to the transgene to direct expression of a 
polypeptide in particular cells. Methods for generating transgenic animals via embryo 
manipulation and microinjection, particularly animals such as mice, have become 

1 0 conventional in the art and are described, for example, in U.S. Patent Nos. 4,736,866 and 
4,870,009, U.S. Patent No. 4,873,191 and in Hogan, Manipulating the Mouse Embryo 
(Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986). Similar 
methods are used for production of other transgenic animals. A transgenic founder 
animal can be identified based upon the presence of the transgene in its genome and/or 

1 5 expression of mRNA in tissues or cells of the animals. A transgenic founder animal can 
then be used to breed additional animals carrying the transgene. Moreover, transgenic 
animals carrying a transgene encoding the transgene can further be bred to other 
transgenic animals carrying other transgenes. 

The present invention also provides isolated polypeptides and variants and 

20 fi-agments thereof that are encoded by the nucleic acid molecules of the invention. For 
example, as described above, the nucleotide sequences can be used to design primers to 
clone and express cDNAs encoding the polypeptides of the invention. 

As used herein, a polypeptide is said to be "isolated" or "purified" when it is 
substantially fi-ee of cellular material when it is isolated firom recombinant and 

25 non-recombinant cells, or free of chemical precursors or other chemicals when it is 
chemically synthesized. A polypeptide, however, can be joined to another polypeptide 
with which it is not normally associated in a cell and still be "isolated" or "purified." 

The polypeptides of the invention can be purified to homogeneity. It is 
understood, however, that preparations in which the polypeptide is not purified to 
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homogeneity are useful and considered to contain an isolated form of the polypeptide. 
The critical feature is that the preparation allows for the desired function of the 
polypeptide, even in the presence of considerable amounts of other components. Thus, 
the invention encompasses various degrees of purity. In one embodiment, the language 
5 "substantially free of cellular material" includes preparations of the polypeptide having 
less than about 30% (by dry weight) other proteins (i.e., contaminating protein), less than 
about 20% other proteins, less than about 10% other proteins, or less than about 5% other 
proteins. 

When a polypeptide is recombinant^ produced, it can also be substantially free of 
10 culture medium, Le., culture medium represents less than about 20%, less than about 
10%, or less than about 5% of the volume of the protein preparation. The language 
"substantially free of chemical precursors or other chemicals" includes preparations of 
the polypeptide in which it is separated from chemical precursors or other chemicals that 
are involved in its synthesis. In one embodiment, the language "substantially free of 
1 5 chemical precursors or other chemicals" includes preparations of the polypeptide having 
less than about 30% (by dry weight) chemical precursors or other chemicals, less than 
about 20% chemical precursors or other chemicals, less than about 10% chemical 
precursors or other chemicals, or less than about 5% chemical precursors or other 
chemicals. 

20 In one embodiment, a polypeptide comprises an amino acid sequence selected 

from the group consisting of SEQ ID NOS: 2, 4, 8, 10 and 16 and the complements 
thereof. However, the invention also encompasses sequence variants. Variants include a 
substantially homologous protein encoded by the same genetic locus in an organism, i.e., 
an allelic variant. Variants also encompass proteins derived from other genetic loci in an 

25 organism, but having substantial homology to a polypeptide of the invention. Variants 
also include proteins substantially homologous to these polypeptides but derived from 
another organism, an ortholog. Variants also include proteins that are substantially 
homologous to these polypeptides that are produced by chemical synthesis. Variants also 
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include proteins that are substantially homologous or identical to these polypeptides that 
are produced by recombinant methods. 

As used herein, two proteins (or a region of the proteins) are substantially 
homologous or identical when the amino acid sequences are at least about 45-55%, 
5 typically at least about 70-75%, more typically at least about 80-85%, and most typically 
at least about 90-95% or more homologous or identical. A substantially homologous 
amino acid sequence, according to the present invention, will be encoded by a nucleic 
acid hybridizing to a nucleic acid sequence described herein, or portion thereof, under 
stringent conditions as more described above. 

1 0 To determine the percent homology or identity of two amino acid sequences, or of 

two nucleic acids, the sequences are ahgned for optimal comparison purposes (e.g., gaps 
can be introduced in the sequence of one protein or nucleic acid for optimal alignment 
with the other protein or nucleic acid). The amino acid residues or nucleotides at 
corresponding amino acid positions or nucleotide positions are then compared. When a 

1 5 position in one sequence is occupied by the same amino acid residue or nucleotide as the 
corresponding position in the other sequence, then the molecules are homologous at that 
position. As used herein, amino acid or nucleic acid "homology" is equivalent to amino 
acid or nucleic acid "identity". The percent homology between the two sequences is a 
function of the number of identical positions shared by the sequences (i.e., per cent 

20 homology equals the number of identical positions/total number of positions times 1 00). 

The invention also encompasses polypeptides having a lower degree of identity 
but having sufficient similarity so as to perform one or more of the same functions 
performed by a polypeptide encoded by a nucleic acid of the invention. Similarity is 
determined by conserved amino acid substitution. Such substitiitions are those that 

25 substitiite a given amino acid in a polypeptide by another amino acid of like 
characteristics. Conservative substitutions are likely to be phenotypically silent. 
Typically seen as conservative substitutions are the replacements, one for another, among 
the aliphatic amino acids Ala, Val, Leu, and He; interchange of the hydroxyl residues Ser 
and Thr, exchange of the acidic residues Asp and Glu, substitution between the amide 
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residues Asn and Gin, exchange of the basic residues Lys and Arg and replacements 
among the aromatic residues Phe, Tyr. Guidance concerning which amino acid changes 
are likely to be phenotypically silent are found in Bowie et al. Science 247:1306-1310 
(1990). 

5 Preferred computer program methods to determine identify and similarity 

between two sequences include, but are not limited to, GCG program package (Devereux, 
J., et al. Nucleic Acids Res,, 72(1):387 (1984)), BLASTP, BLASTN, FASTA (Atschul, 
S.F. et Molec. Biol, 215:403 (1990)). 

A variant polypeptide can differ in amino acid sequence by one or more 

10 substitutions, deletions, insertions, inversions, fusions, and truncations or a combination 
of any of these. Further, variant polypeptides can be fully functional or can lack function 
in one or more activities. Fully functional variants typically contain only conservative 
variation or variation in non-critical residues or in non-critical regions. Functional 
variants can also contain substitution of similar amino acids that result in no change or an 

1 5 insignificant change in function. Altematively, such substitutions may positively or 
negatively affect function to some degree. 

Non- functional variants typically contain one or more non-conservative amino 
acid substitutions, deletions, insertions, inversions, or truncation or a substitution, 
insertion, inversion, or deletion in a critical residue or critical region. As indicated, 

20 variants can be naturally-occurring or can be made by recombinant means or chemical 
synthesis to provide useful and novel characteristics for the polypeptide. This includes 
preventing immunogenicity from pharmaceutical formulations by preventing protein 
aggregation. 

Amino acids that are essential for function can be identified by methods known in 
25 the art, such as site-directed mutagenesis or alanine-scanning mutagenesis (Cunningham 
et al. Science, 2^^:1081-1085 (1989)). The latter procedure introduces single alanine 
mutations at every residue in the molecule. The resulting mutant molecules are then 
tested for biological activity in vitro, or in vitro proUferative activity. Sites that are 
critical for polypeptide activity can also be determined by structural analysis such as 
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crystallization, nuclear magnetic resonance or photoaffmity labeling (Smith et al , 1 Mol 
Biol^ 22^:899-904 (1992); de Vos et al Science, 255:306-312 (1992)). 

The invention also includes polypeptide fragments or portions of the polypeptides 
of the invention, as well as fragments of the variants of the polypeptides described herein. 
5 As used herein, a jfragment comprises at least 6 contiguous amino acids. Useful 
fragments include those that retain one or more of the biological activities of the 
polypeptide as well as fragments that can be used as an immunogen to generate 
polypeptide specific antibodies. 

Biologically active fragments (peptides which are, for example, 6, 9, 12, 15, 20, 

10 30, 35, 36, 37, 38, 39, 40, 50, 100 or more amino acids in length) can comprise a domain, 
segment, or motif that has been identified by analysis of the polypeptide sequence using 
well-known methods, e.g., signal peptides, extracellular domains, one or more 
transmembrane segments or loops, ligand binding regions, zinc finger domains, DNA 
binding domains, acylation sites, glycosylation sites, or phosphorylation sites. 

1 5 The invention also provides fragments with immunogenic properties. These 

contain an epitope-bearing portion of the polypeptides and variants of the invention. 
These epitope-bearing peptides are useful to raise antibodies that bind specifically to a 
polypeptide or region or fragment. These peptides can contain at least 6, 7, 8, 9, 12, at 
least 14, or between at least about 15 to about 30 amino acids. The epitope-bearing 

20 peptide and polypeptides may be produced by any conventional means (Houghten, R.A., 
Proc. Natl Acad. Set USA, 52:5131-5135 (1985)). Simultaneous multiple peptide 
synthesis is described in U.S. Patent No. 4,631,211. 

Fragments can be discrete (not fused to other amino acids or polypeptides) or can 
be within a larger polypeptide. Further, several fragments can be comprised within a 

25 single larger polypeptide. In one embodiment a fragment designed for expression in a 
host can have heterologous pre- and pro-polypeptide regions fused to the amino terminus 
of the polypeptide fragment and an additional region fused to the carboxyl terminus of 
the fragment. 



2825,1021-003 



-30- 

The invention thus provides chimeric or fusion proteins. These comprise a 
polypeptide of the invention operatively linked to a heterologous protein having an amino 
acid sequence not substantially homologous to the polypeptide. "Operatively linked" 
indicates that the polypeptide protein and the heterologous protein are fused in-frame. 
5 The heterologous protein can be fused to the N-terminus or C-terminus of the 
polypeptide. In one embodiment the fusion protein does not affect function of the 
polypeptide per se. For example, the fusion protein can be a GST-fusion protein in 
which the polypeptide sequences are fused to the C-terminus of the GST sequences. 
The isolated polypeptide can be purified from cells that naturally express it, such as from 

10 mammary epithehum, purified from cells that have been altered to express it 
(recombinant), or synthesized using known protein synthesis methods. 

In one embodiment, the protein is produced by recombinant DNA techniques. 
For example, a nucleic acid molecule encoding the polypeptide is cloned into an 
expression vector, the expression vector introduced into a host cell and the protein 

15 expressed in the host cell. The protein can then be isolated from the cells by an 
appropriate purification scheme using standard protein purification techniques. 

Polypeptides often contain amino acids other than the 20 amino acids commonly 
referred to as the 20 naturally-occurring amino acids. Further, many amino acids, 
including the terminal amino acids, may be modified by natural processes, such as 

20 processing and other post-translational modifications, or by chemical modification 
techniques well known in the art. Common modifications that occur naturally in 
polypeptides are described in basic texts, detailed monographs, and the research 
literature, and they are well known to those of skill in the art. 

Accordingly, the polypeptides also encompass derivatives or analogs in which a 

25 substituted amino acid residue is not one encoded by the genetic code, in which a 
substituent group is included, in which the mature polypeptide is fused with another 
compound, such as a compound to increase the half-hfe of the polypeptide (for example, 
polyethylene glycol), or in which the additional amino acids are fused to the mature 
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polypeptide, such as a leader or secretory sequence or a sequence for purification of the 
mature polypeptide or a pro-protein sequence. 

In general, polypeptides or proteins of the present invention can be used as a 
molecular weight marker on SDS-PAGE gels or on molecular sieve gel filtration columns 
5 using art-recognized methods. The polypeptides of the present invention can be used to 
raise antibodies or to elicit an immune response. The polypeptides can also be used as a 
reagent, e.g., a labeled reagent, in assays to quantitatively determine levels of tiie protein 
or a molecule to which it binds (e.g., a receptor or a ligand) in biological fluids. The 
polypeptides can also be used as markers for tissues in which the corresponding protein is 
1 0 preferentially expressed, either constitutively, during tissue differentiation, or in a 

diseased state. The polypeptides can be used to isolate a corresponding binding partner, 
e.g., receptor or ligand, such as, for example, in an interaction ti-ap assay, and to screen 
for peptide or small molecule antagonists or agonists of the binding interaction. 

In another aspect, the invention provides antibodies to the polypeptides and 
1 5 polypeptide fi-agments of the invention. The term "antibody" as used herein refers to 
immunoglobuhn molecules and immunologically active portions of immunoglobulin 
molecules, i.e., molecules that contain an antigen binding site that specifically binds an 
antigen. A molecule that specifically binds to a polypeptide of the invention is a 
molecule that binds to that polypeptide or a fi-agment thereof, but does not substantially 
10 bind other molecules in a sample, e.g. , a biological sample, which naturally contains the 
polypeptide. Examples of immunologically active portions of immunoglobulin 
molecules include F(ab) and F(ab')2 fi-agments which can be generated by treating the 
antibody with an enzyme such as pepsin. The invention provides polyclonal and 
monoclonal antibodies that bind to a polypeptide of the invention; such antibodies can be 
15 made using methods known in the art. The term "monoclonal antibody" or "monoclonal 
antibody composition", as used herein, refers to a population of antibody molecules that 
contain only one species of an antigen bindmg site capable of immunoreacting with a 
particular epitope of a polypeptide of the invention. A monoclonal antibody composition 
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thus typically displays a single binding affinity for a particular polypeptide of the 

invention with which it immimoreacts. 

Additionally, recombinant antibodies, such as chimeric and humanized 

monoclonal antibodies, comprising both human and non-human portions, which can be 
5 made using standard recombinant DNA techniques, are within the scope of the invention. 

Such chimeric and humanized monoclonal antibodies can be produced by recombinant 

DNA techniques known in the art, for example using methods described in PCT 

Publication No. WO 87/02671; European Patent Application 184,187; European Patent 

Application 171,496; European Patent Application 173,494; PCT Publication No. WO 
10 86/01533; U.S. Patent No. 4,816,567; European Patent Apphcation 125,023; Better et al 

(1988) Science, 2^0:1041-1043; Liu et al (1987) Proc. Natl Acad. Set USA, 

5^:3439-3443; Liu et al (1987) J, Immunol, i39:3521-3526; Sun et al (1987) Proc. 

Natl Acad. Sci USA, 5^:214-218; Nishimurae^ a/. (1987) Cana Res., ^7:999-1005; 

Wood et al (1985) Nature, 314:446-449; and Shaw et al (1988) 1 Natl Cancer Inst., 
15 50:1553-1559); Morrison (1985) Science, 229:1202-1207; Oi etal (1986) 

Bio/Techniques, 4:214; U.S. Patent 5,225,539; Jones etal (1986) Nature, J2i:552-525; 

Verhoeyan et al (1988) Science, 25P:1534; and Beidler et al (1988) /. Immunol, 

747:4053-4060. 

In general, antibodies of the invention (e.g., a monoclonal antibody) can be used 
20 to isolate a polypeptide of the invention by standard techniques, such as affinity 

chromatography or immunoprecipitation. A polypeptide specific antibody can facilitate 
the purification of natural polypeptide from cells and of recombinantly produced 
polypeptide expressed in host cells. Moreover, an antibody specific for a polypeptide of 
the invention can be used to detect the polypeptide (e.g., in a cellular lysate, cell 
25 supernatant, or tissue sample) in order to evaluate the abimdance and pattern of 

expression of the polypeptide. Antibodies can be used diagnostically to monitor protein 
levels in tissue as part of a clinical testing procedure, e.g., to, for example, determine the 
efficacy of a given treatment regimen. Detection can be facilitated by coupUng the 
antibody to a detectable substance. Examples of detectable substances include various 
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enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent 
materials, and radioactive materials. Examples of suitable enzymes include horseradish 
peroxidase, alkaline phosphatase, (p-galactosidase, or acetylcholinesterase; examples of 
suitable prosthetic group complexes include streptavidin^iotin and avidin^iotin; 
5 examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein 
isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or 
phycoerythrin; an example of a luminescent material includes luminol; examples of 
bioluminescent materials include luciferase, luciferin, and aequorin, and examples of 
suitable radioactive material include ^^^I, ^^^I, ^^S or ^H. 

10 The present invention also pertains to diagnostic assays and prognostic assays 

used for prognostic (predictive) purposes to thereby treat an individual prophylactically. 
Accordingly, one aspect of the present invention relates to diagnostic assays for 
determining protein and/or nucleic acid expression as well as activity of proteins of the 
invention, in the context of a biological sample (e.g., blood, serum, cells^ tissue) to 

15 thereby determine whether an individual is afflicted with a disease or disorder, or is at 
risk of developing a disorder, e.g., a neurodegenerative disorders such as ARSACS, 
associated with aberrant expression or activity. The invention also provides for 
prognostic (or predictive) assays for determining whether an individual is at risk of 
developing a disorder associated with activity or expression of proteins or nucleic acids 

20 of the invention. 

Disorders which may be treated or diagnosed by methods described herein 
include, but are not limited to, neurodegenerative disease comprising one or more 
symptoms or effects selected from the group consisting of: reduced sensory nerve 
conduction, reduced motor nerve velocity, hypermyelination of retinal nerve fibers, 

25 atrophy of upper cerebellar vermis, absence of Purkinje cells and abnormal neuronal lipid 
storage. The invention is particularly suited to treat and diagnose ARSACS. 

Another aspect of the invention pertains to monitoring the influence of agents 
(e.g., drugs, compounds) on the expression or activity of proteins of the invention in 
clinical trials. 
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An exemplary method for detecting the presence or absence of proteins or nucleic 
acids of the invention in a biological sample involves obtaining a biological sample from 
a test subject and contacting the biological sample with a compound or an agent capable 
of detecting the protein, or nucleic acid (e.g., mRNA, genomic DNA) that encodes the 
5 protein, such that the presence of the protein or nucleic acid is detected in the biological 
sample. A preferred agent for detecting mRNA or genomic DNA is a labeled nucleic 
acid probe capable of hybridizing to mRNA or genomic DNA sequences described 
herein. The nucleic acid probe can be, for example, a full-length nucleic acid, or a 
portion thereof, such as an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 

10 nucleotides in length and sufficient to specifically hybridize under stringent conditions to 
appropriate mRNA or genomic DNA. Other suitable probes for use in the diagnostic 
assays of the invention are described herein. 

In one embodiment, the agent for detecting proteins of the invention is an 
antibody capable of binding to the protein, preferably an antibody with a detectable label. 

15 Antibodies can be polyclonal, or more preferably, monoclonal. An intact antibody, or a 
fi-agment thereof (e.g., Fab or F(ab')2) can be used. The term "labeled", with regard to the 
probe or antibody, is intended to encompass direct labehng of the probe or antibody by 
coupling (i.e., physically hnking) a detectable substance to the probe or antibody, as well 
as indirect labeling of the probe or antibody by reactivity with another reagent that is 

20 directly labeled. Examples of indirect labeling include detection of a primary antibody 
using a fluorescently labeled secondary antibody and end-labeling of a DNA probe with 
biotin such that it can be detected with fluorescently labeled streptavidin. In a preferred 
embodiment, the antibody is able to distinguish between complete or nearly complete 
proteins and truncated versions of the same protein. 

25 The term "biological sample" is intended to include tissues, calls and biological 

fluids isolated from a subject, as well as tissues, cells and fluids present within a subject. 
For example, the sample can be obtained from a tissue selected from the group consisting 
of: brain tissue, CNS, lung, fetal lung, testis, lymphocytes, adipose, fibroblasts, skeletal 
muscle, pancreas, uterus, kidney, tonsil, embryo and isolated cells thereof That is, the 
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detection method of the invention can be used to detect mRNA, protein, or genomic 
DNA of the invention in a biological sample in vitro as well as in vivo. For example, in 
vitro techniques for detection of mRNA include Northern hybridizations and in situ 
hybridizations. In vitro techniques for detection of protein include enzyme linked 
5 immunosorbent assays (ELISAs), Western blots, immunoprecipitations and 
immunofluorescence. In vitro techniques for detection of genomic DNA include 
Southern hybridizations. Furthermore, in vivo techniques for detection of protein include 
introducing into a subject a labeled anti-protein antibody. For example, the antibody can 
be labeled with a radioactive marker whose presence and location in a subject can be 

1 0 detected by standard imaging techniques. 

In one embodiment, the biological sample contains protein molecules from the 
test subject. Alternatively, the biological sample can contain mRNA molecules from the 
test subject or genomic DNA molecules from the test subject. A preferred biological 
sample is a serum sample or mammary epitheUum isolated by conventional means from a 

15 subject. A nucleic acid sample is a sample, e.g,, a biological sample, which contains 
nucleic acid molecules. 

The invention also encompasses kits for detecting the presence of proteins or 
nucleic acid molecules of the invention in a biological sample. For example, the kit can 
comprise a labeled compound or agent capable of detecting protein or mRNA in a 

20 biological sample; means for determining the amount of in the sample; and means for 
comparing the amount of in the sample with a standard. The compound or agent can be 
packaged in a suitable container. The kit can further comprise instructions for using the 
kit to detect protein or nucleic acid. 

The diagnostic methods described herein can also be utiUzed to identify subjects 

25 having or at risk of developing a disease or disorder associated with aberrant expression 
or activity of proteins and nucleic acid molecules of the invention. For example, the 
assays described herein can be utilized to identify a subject having or at risk of 
developing a disorder associated with Spastin protein or spastin nucleic acid expression 
or activity such as a neurodegenerative disorder. Thus, the present invention provides a 
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method for identifying a disease or disorder associated with aberrant expression or 
activity of proteins or nucleic acid molecules of the invention, in which a test sample is 
obtained from a subject and protein or nucleic acid molecule (e.g., mRNA, genomic 
DNA) is detected, wherein the presence of an altered protein or nucleic acid molecule is 
5 diagnostic for a subject having or at risk of developing a disease or disorder associated 
with aberrant expression or activity of the protein or nucleic acid sequence of the 
invention. In certain embodiments as described herein, it is valuable to determine the 
genotype of an individual, particularly where a specific allelic form is associated with 
disease. For example, it will be valuable for purposes of diagnosis to determine an 
10 individual's genotype for the C52454T mutation with respect to ARSACS diagnosis, i.e., 
to identify alteration in the spastin gene or Spastin protein. 

Detection of the alteration can involve the use of a probe/primer in a polymerase 
chain reaction (PGR) (see, e.g., U.S. Patent Nos. 4,683,195 and 4,683,202), such an 
anchor PGR or RAGE PGR, or, alternatively, in a ligation chain reaction (LGR) (see, e.g., 
1 5 Landegran et a/. (1 988) Science, 241 : 1 077- 1 080; and Nakazawa et al. (1 994) PNAS, 
P7:3 60-364), the latter of which can be particularly useful for detecting point mutations 
(see Abravaya et al. (1995) Nucleic Acids Res., 2J:675-682). This method can include 
the steps of collecting a sample of cells from a patient, isolating nucleic acid molecules 
(e.g., genomic, mRNA or both) from the cells of the sample, contacting the nucleic acid 
20 sample with one or more primers which specifically hybridize to the gene under 

conditions such that hybridization and amplification of the gene (if present) occurs, and 
detecting the presence or absence of an ampUfication product, or detecting the size of the 
amplification product and comparing the length to a control sample. It is anticipated that 
PGR and/or LGR may be desirable to use as a prehminary ampUfication step in 
25 conjunction with any of the techniques used for detecting mutations described herein. In 
one embodiment allele-specific primers are utilized. 

Alternative amplification methods include: self sustained sequence replication 
(GuateUi, J.G. et al. (1990) Proc. Natl. Acad. Sci. USA, 57:1874-1878), transcriptional 
amplification system (Kwoh, D.Y. et al, (1989) Proc. Natl Acad ScL USA. 
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<9<5:1 173-1 177), Q-Beta Replicase (Lizardi, P.M. et a/.,(1988) Bio/Technology, 6:1 197), 
or any other nucleic acid amplification method, followed by the detection of the 
amplified molecules using techniques well known to those of skill in the art. These 
detection schemes are especially usefial for the detection of nucleic acid molecules if such 
5 molecules are present in very low numbers. 

In an alternative embodiment, mutations in a given gene from a sample cell can 
be identified by alterations in restriction enzyme cleavage patterns. For example, sample 
and control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and 

10 compared. Differences in fragment length sizes between sample and control DNA 
indicate mutations in the sample DNA. Moreover, the use of sequence specific 
ribozymes (see, for sample, U.S. Patent No. 5,498,531) can be used to score for the 
presence of specific mutations, e.g., the C5254T mutation, by development or loss of a 
ribozyme cleavage site. 

15 In other embodiments, genetic mutations can be identified by hybridizing a 

sample and control nucleic acids, e.g., DNA or RNA, to high density arrays containing 
many oligonucleotide probes (Cronin, M.T. et al (1996) Human Mutation, 7:244-255; 
Kozal, M.J. et al{l996) Nature Medicine, 2:753-759). For example, genetic mutations 
can be identified in two dimensional arrays containing light-generated DNA probes as 

20 described in Cronin, M.T, et al supra. Briefly, a first hybridization array of probes can 
be used to scan through long stretches of DNA in a sample and control to identify base 
changes between the sequences by making linear arrays of sequential overlapping probes. 
This step allows the identification of point mutations. This step is followed by a second 
hybridization array that allows the characterization of specific mutations by using 

25 smaller, speciahzed probe arrays complementary to all variants or mutations detected. 
Each mutation array is composed of parallel probe sets, one complementary to the 
wild-type gene and the other complementary to the mutant gene. 

In yet another embodiment, any of a variety of sequencing reactions known in the 
art can be used to directly sequence the gene and detect specific mutations by comparing 
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the sequence of the gene from the sample with the corresponding wild-type (control) 
gene sequence. Examples of sequencing reactions include those based on techniques 
developed by Maxim and Gilbert ((1997) PNAS, 74:560) or Sanger ((1977) PNAS, 
74:5463). It is also contemplated that any of a variety of automated sequencing 
5 procedures can be utilized when performing the diagnostic assays ((1995) Biotechniques, 
/P:448), including sequencing by mass spectrometry (see, e.g., PCT International 
Publication No. WO 94/16101; Cohtnetal {1996) Adv. Chromatogr., 5(5:127-162; and 
Gnffm etal. (1993) AppL Biochem, Biotechnol, 55:147-159). 

Other methods for detecting mutations include methods in which protection from 

10 cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA 

heteroduplexes (Myers et al (1985) Science, 250:1242). In general, the art technique of 
"mismatch cleavage" starts by providing heteroduplexes formed by hybridizing (labeled) 
RNA or DNA containing the wild-type sequence with potentially mutant RNA or DNA 
obtained from a tissue sample. The double-standard duplexes are treated with an agent 

15 that cleaves single-stranded regions of the duplex such as which will exist due to base 
pair mismatches between the control and sample strands. For instance, RNA/DNA 
duplexes can be treated with Rnase and DNA/DNA hybrids treated with SI nuclease to 
enzymatically digest the mismatched regions. After digestion of the mismatched regions, 
the resulting material is then separated by size on denaturing polyacrylamide gels to 

20 determine the site of mutation. See, for example Cotton et al (1988) Proc. Natl Acad, 
Set USA, 55:4397; Saleebae/a/. {1992) Methods Enzymol, 277:286-295. In certain 
embodiments, the control DNA or RNA can be labeled for detection. 

In still another embodiment, the mismatch cleavage reaction employs one or more 
proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 

25 mismatch repair" enzymes) in defined systems for detecting and mapping point mutations 
in cDNAs obtained from samples of cells. For example, the mutY enzyme ofE. coli 
cleaves A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells 
cleaves T at G/T mismatches (Hsu et al (1994) Carcinogenesis, 75:1657-1662). 
According to an exemplary embodiment, a probe based on an nucleotide sequence of the 
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invention is hybridized to a cDNA or other DNA product from a test cell(s). The duplex 
is treated with a DNA mismatch repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. See, for example, U.S. Patent No. 
5,459,039. 

5 In other embodiments, alterations in elecfrophoretic mobility will be used to 

identify mutations in nucleic acid molecules described herein. For example, single strand 
conformation polymorphism (SSCP) may be used to detect differences in elecfrophoretic 
mobility between mutant and wild type nucleic acids (Orita et al. (1989) Proc. Natl. 
Acad. Set USA, 86:2766, see also Cotton (1993) MutatRes, 255:125-144; and Hayashi 

10 (1992) Genet Anal. Tech. Appl, P:73-79). Single-stranded DNA fragments of sample 
and confrol nucleic acids will be denatured and allowed to renature. The secondary 
structure of single-sfranded nucleic acids varies according to sequence, and the resulting 
alteration in elecfrophoretic mobility enables the detection of even a single base change. 
The DNA fragments may be labeled or detected with labeled probes. The sensitivity of 

1 5 the assay may be enhanced by using RNA (rather than DNA), in which the secondary 
structure is more sensitive to a change in sequence. In one embodiment, the subject 
method utilizes heteroduplex analysis to separate double stranded heteroduplex 
molecules on the basis of changes in elecfrophoretic mobility (Keen et al. (1991) Trends 
Genet., 7:5). 

20 In yet another embodiment the movement of mutant or wild-type fragments in 

polyacrylamide gels containing a gradient of denafrirant is assayed using denaturing 
gradient gel elecfrophoresis (DGGE) (Myers et al. (1985) Nature, 573:495). When 
DGGE is used as the method of analysis, DNA will be modified to insure that it does not 
completely denature, for example by adding a GC clamp of approximately 40 bp of 

25 high-melting GC-rich DNA by PCR. hi a fixrther embodiment, a temperafrire gradient is 
used in place of a denafriring gradient to identify differences in the mobihty of control 
and sample DNA (Rosenbaum and Reissner (1987) Biophys. Chem., 265:12753). 

Examples of other techniques for detecting point mutations include, but are not 
limited to, selective oUgonucIeotide hybridization, selective amplification, or selective 
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primer extension. For example, oligonucleotide primers may be prepared in which the 
known mutation is placed centrally and then hybridized to target DNA under conditions 
which permit hybridization only if a perfect match is found (Saiki et ah (1986) Nature, 
324:163); Saiki etal (1989) Proa Natl. Acad ScL USA, 86:6320). Such allele-specific 
5 oUgonucleotides are hybridized to PCR amplified target DNA. 

Alternatively, allele specific amplification technology that depends on selective 
PCR amplification may be used in conjunction with the instant invention. 
Oligonucleotides used as primers for specific amplification may carry the mutation of 
interest in the center of the molecule (so that amplification depends on differential 

10 hybridization) (Gibbs et al (1989) Nucleic Acids Res., 77:2437-2448) or at the extreme 3' 
end of one primer where, under appropriate conditions, mismatch can prevent, or reduce 
polymerase extension (Prossner (1993) Tibtech, 11:23%). In addition it may be desirable 
to introduce a novel restriction site in the region of the mutation to create cleavage-based 
detection (Gasparini et al. (1992) Mol. Cell Probes, 6: 1). It is anticipated that in certain 

1 5 embodiments amphfication may also be performed using Taq ligase for ampHfication 
(Barany (1991) Proc. Natl. Acad. Sci USA, 5S:189). In such cases, ligation will occur 
only if there is a perfect match at the 3' end of the 5' sequence making it possible to detect 
the presence of a known mutation at a specific site by looking for the presence or absence 
of amplification. Single base extension (SBE) and SEE fluorescence resonance energy 

20 transfer (SBE-FRET) can also be used to identify the specific nucleotide which occupies 
a given position in a nucleic acid molecule. 

The methods described herein may be performed, for example, by utilizing 
pre-packaged diagnostic kits comprising at least one probe nucleic acid molecule or 
antibody reagent described herein, which may be conveniently used, e.g., in clinical 

25 settings to diagnose patients exhibiting symptoms or family history of a disease or illness 
involving a gene of the present invention. Any cell type or tissue in which the gene is 
expressed may be utilized in the prognostic assays described herein. 
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The invention will now be described by the following non-limiting examples. 
The teachings of all references cited herein are incorporated herein by reference in their 
entirety. 

EXEMPLIFICATION 
5 Methods 

All subjects participating in this study gave informed consent according to 
institutional and national standards (29). Sequence analysis was performed on 24 
ARSACS patients from 17 famiUes. 

B AC/PAC DNA Preparation 

1 0 Small quantities of DNA were prepared from BAC and PAC cell cultures (12.5 |al 

Chloramphenicol for BACS, 30 |ig/ml Kanamycin for PACs) using a modified alkaline 
lysis procedure according to a published protocol (30), Larger quantities of DNA for the 
construction of libraries and direct sequencing were prepared using Qiagen (Qiagen, 
Valencia, CA) or Nucleobond columns (The Nest Group, Southboro, MA) according to 

1 5 the manufacturers' protocols. 

Ml 3 Library Construction and Preparation of Ml 3 Single-Stranded DNA 

BAC and PAC DNA was sheared in a sonicator to an average size of 2 kb and the 
ends were made blunt with Mung Bean Nuclease (New England Biolabs, Beverly, MA). 
The fragments were gel-purified, and subcloned into an M13mpl8 Sma I-cut 

20 dephosphorylated cloning vector (Amersham, Uppsala, Sweden). Ligation reactions 
were transformed into XL2-Blue competent cells (Stratagene, LaJoUa, CA). Phage 
plaques of M13 subclones from the BACs and PACs were grown overnight in 0.5 ml of 
2x YT media with 10 ^il of log phase TG-1 cells. Single-stranded Ml 3 DNA for 
sequencing was purified from 100 |lx1 of the culture supernatant with magnetic beads 

25 (PerSeptive Diagnostics, Cambridge, MA) according to the manufacturer's instructions. 
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Sequencing 

Fluorescent sequencing of PCR products and Ml 3 single-stranded DNA was 
accomplished using the Dye Primer Cycle Sequencing Ready Reaction kit (Perkin Elmer, 
Foster City, CA). Sequencing reactions contained approximately 400 ng of template in 
5 1 . 5 ^il and 3 p,l of assay mixture for each primer. The thermal cycling parameters for the 
sequencing reactions were: 96°C for 10 seconds, 55°C for 5 seconds, and 70°C for 1 
minute (15 cycles) followed by 96°C for 10 seconds, and 70°C for 1 minute (15 cycles). 
Reaction products for each primer were combined and purified with an ethanol 
precipitation. Sequence samples were prepared, loaded, and run on ABI 377 sequencers 

1 0 according to the manufacturer's instructions (Perkin Elmer). The sequences were 
assembled into contigs and analyzed with the STADEN software package (version 
1997.1) (31, 32) and Auto Assembler (version 2.0) (Perkin Elmer). Direct sequencing of 
BACs was accomplished with Dye Terminator chemistry according to a previously 
published protocol (33). The sequence of the entire mouse and human ORFs was verified 

1 5 by either sequencing unambiguously on both strands or by sequencing a single strand 
with both the Dye Primer and the Dye Terminator reaction systems. All sequences were 
compared with GenBank databases and dbEST using the Search Launcher Batch Chent 
software for Macintosh from Baylor College of Medicine (34) with Repeat/Masker pre- 
screening. 

20 Computational Analyses 

Web-based sequence analysis included (using default parameters): 

BLAST:http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-nesblast?Jform=l; 
FASTA:http://www.ebi.acc.uk/searches/fasta.html; 
PSORT:http://psort.nibb.acc.jp:8800; 
25 EXPASY Proteomics tools:http://www.expasy.ch/tools/; 

BCM Search Launcher:http://www.hgsc.bcm.tmc.edu/SearchLauncher/; 
mac-search-launcher:ftp://dot.bcm.tmc.edu/pub/software/search-launcher/; 
COILS (35) web server:http://www.ch.embnet.org/software/COILS_form.html. 
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Mutation Analysis 

50 ng of genomic DNA, extracted from peripheral blood leukocytes, was 
amplified using the primers in Figure 7. Primer pairs were designed using the web-based 
version of the Primer 3.0 program and PCR reactions were individually optimized. The 
5 resulting products were purified using magnetic beads (PerSeptive Diagnostics) 
according to the manufacturer's instructions and sequenced as above. 

RNA Preparation and Northern Blot Analysis 

Total RNA was extracted using the guanidinium/CsCl method from skin 
fibroblast cell Unes from ARSACS patients and a control individual; the cell lines were 
grown in Eagle modified MEM (CellGro, Hemdon, VA) with 10% FBS (Canadian Life 
Technology, Burlington, Ontario). 10 ^g of RNA was elecfrophoresed m a 1% agarose 
gel and then transferred to a nylon membrane (Magna Charge, MSI, Westboro, MA) by 
capillary transfer with 20x SSC buffer. Pre-transfer alkaline hydrolysis of the gel was 
performed with 0.05M NaOH. The ^^P-labeled spastin probe was generated by random 
priming with the Rediprime 11 system (Amersham) using the 1.8 kb insert from an 
IMAGE cDNA clone (279258) purified after separation on low melting point agarose 
(Life Technologies, Rockville, MD). Hybridization for both the fibroblast blot and the 
multiple tissue northern blot (MTN, Human I #7760-1, Clontech) was done in 
ExpressHyb buffer (Clontech, Palo Alto, CA) followed by washing according to 
manufacturer's instructions. The size standard for both northern blots was a 0.24-9.5 kb 
RNA ladder (Life Technologies). 

RT-PCR 

500 ng of total RNA from skin fibroblasts of ARSACS patients and controls, as 
well as a commercial preparation of total RNA from cerebellum (Clontech), were 
25 amplified using sense and antisense primers (Figure 7) and the Superscript one step kit 
(Life Technologies). In all cases a parallel confrol reaction was set up in the absence of 
RT. The resulting products were purified and sequenced as above. 
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In situ hybridizations 

Oligonucleotides complementary to nucleotides 1 1,009-11,055 of the human 
spastin gene (probe NIB226-1) and a sense 45-mer for the same region were synthesized 
and purified (MedProbe, Oslo, Norway). To exclude the possibility of any cross- 
5 hybridization to other human mRNAs, homology searches were carried out. A database 
search revealed no significant homologies, except for the intended targets. The 
oligonucleotides were subsequently labeled with a ^^S-labeled dATP (NEG 034H, NEN 
DuPont, Boston, MA) at the 3' end using terminal dideoxy nucleotidyl transferase to a 
specific activity of 2 x 10^ cpm/|ig and purified on a Nensorb 20 column. 

10 The tissue was cut to 14 jxm thickness in a cryostat, thawed onto Fisher probe on 

(+) slides (Fisher Biotech, Springfield, NJ), and processed for in situ hybridization 
according to Schalling et al (36). In brief, sections were incubated at 42°C for 15-18 
hours with 10^ cpm of labeled probe per 100 jil of a solution containing 50% formamide, 
4x SSC, Ix Denhardfs solution, 1% sarcosyl, 0.02 M sodium phosphate (NaP04, pH 7.0) 

1 5 and 10% dextran sulfate mixed with 500 |iig/ml sonicated salmon sperm DNA and 200 
mM dithiothreitoL Sections were rinsed in Ix SSC at 55''C for one hour, dried and 
exposed to x-ray film (Amersham Hyperfilm P-max) for 14-21 days. 

Mouse BAG Clone and Radiation Hybrid (RH) Panel Analysis 

The clone containing the mouse genomic sequence (418_B_1 1) is fi-om a 129 SV 

20 mouse BAG library, CitbGJ7B cloned in the vector pBeloBACl 1 (Research Genetics, 
Huntsville, AL). The RH mapping of mouse spastin was performed using the T31 
mouse-hamster hybrid mapping panel (11). The initial attempts with several mouse 
spastin primers failed due to the amplification of a hamster PGR product of similar size 
to the mouse product, A hamster PGR product was sequenced, which revealed minor 

25 sequence differences with mouse spastin. The successfiil mouse spastin primers were 
MARS-3F ((TCATTCATATGTCGCAGGGAGATGT; SEQ ID NO: 72) and MARS-3R 
(CTACTAGAAGTGCATGTGGGGG; SEQ ID NO: 73). The RH vector obtained fi-om 
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testing the T31 panel was compared to the reference map generated at MIT (12) using the 
"placement" function of RHMAPPER. 

Computation of the P^^,^,, Statistic 

Seven-marker haplotypes for 55 ARSACS and 58 normal chromosomes were 
5 obtained from 68 obhgate carrier parents by not counting copies that were considered to 
be identical by descent within a pedigree (5). Marker haplotypes were constructed using 
the SIMWALK2 program (37). The simple linkage disequilibrium mapping measure 

Pexcess^'CPaffected ' Pnovm^di^-Pnorm^d^ 23, 38, 39, 40) was calculatcd from the frequencies of 
haplotypes. 

1 0 While this invention has been particularly shown and described with references to 

preferred embodiments thereof, it will be understood by those skilled in the art that 
various changes in form and details may be made therein without departing from the 
scope of the invention encompassed by the appended claims. 
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CLAIMS 

What is claimed is; 

1 . An isolated nucleic acid molecule comprising a nucleotide sequence selected 
from the group consisting of: 

5 a) SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15; and 

b) the complement of SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14 and 15. 

2. An isolated nucleic acid molecule comprising an exon from a vertebrate gene 
wherein said exon is at least 1 150 base pairs in length. 

3. An isolated nucleic acid molecule according to Claim 2, wherein said gene is a 
10 human gene. 

4. An isolated nucleic acid molecule according to Claim 2, wherein said gene is a 
spastin gene. 

5. An isolated nucleic acid molecule consisting of a nucleotide sequence selected 
from the group consisting of: 

15 a) SEQ ID NOS: 1,3,7,9, 11, 12, 13, 14 and 15; and 

b) the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15. 

6. An isolated portion of a nucleic acid sequence selected from the group consisting 
of: 

a) SEQ ID NOS: 1,3,7,9, 11, 12, 13, 14 and 15; and 
20 b) the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15, 

wherein the portion is at least about 10 nucleotides in length. 
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A nucleic acid molecule comprising a nucleotide sequence which is at least about 
60% identical to a nucleotide sequence selected jfrom the group consisting of: 

a) SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15; and 

b) the complement of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14 and 15. 

A nucleic acid molecule which hybridizes under high stringency conditions to a 
nucleotide sequence selected from the group consisting of: 

a) SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14 and 15; and 

b) the complement of SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14 and 15. 



9 . A nucleic acid constinct comprising the isolated nucleic acid molecule of 
10 Claim 1. 

1 0. The nucleic acid construct of Claim 9 wherein the isolated nucleic acid molecule 
is operatively linked to a regulatory sequence. 

11. A recombinant host cell comprising the isolated nucleic acid molecule of Claim 1 . 

15 

12. The recombinant host cell of Claim 1 1 wherein the isolated nucleic acid is 
operatively hnked to a regulatory sequence. 

13. A method for preparing a polypeptide encoded by an isolated nucleic acid 
molecule, comprising culturing the recombinant host cell of Claim 12. 

20 14. An isolated polypeptide encoded by an isolated nucleic acid molecule according 
to Claim 1 . 

15. An isolated polypeptide encoded by an isolated nucleic acid molecule according 
to Claim 5. 
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An antibody, or an antigen-binding fragment thereof, which selectively binds to 
the polypeptide encoded by an isolated nucleic acid molecule according to Claim 
1, or to a portion of said polypeptide. 

A method for assaying the presence of a nucleic acid molecule in a sample, 
comprising contacting said sample with a nucleotide sequence selected from the 
group consisting of: 

a) SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14, 15, 17-66, 72 and 73; 

b) the complement of SEQ ID NOS: 1, 3, 7, 9, 1 1, 12, 13, 14, 15, 17- 
66, 72 and 73; 

c) a portion of any one of SEQ ID NOS: 1, 3, 7, 9, 11, 12, 13, 14, 15, 
17-66, 72 and 73 which is at least 10 nucleotides in length; and 

d) a portion of the complement of any one of SEQ ID NOS: 1,3,7,9, 
1 1, 12, 13, 14, 15, 17-66, 72 and 73 which is at least 10 
nucleotides in length under conditions appropriate for selective 
hybridization. 

A method for assaying the presence of a polypeptide encoded by an isolated 
nucleic acid molecule according to Claim 1 in a sample, comprising contacting 
said sample with an antibody which specifically binds to the encoded polypeptide. 

An isolated polypeptide comprising an amino acid sequence selected from the 
group consisting of SEQ ID NOS: 2, 4, 8, 10, 16 and 67-69. 

An isolated polypeptide comprising an amino acid sequence having greater than 
75 % identity to an amino acid sequence selected from the group consisting of 
SEQ ID NOS: 2, 4, 8, 10, 16 and 67-69. 

An antibody which specifically binds to the polypeptide of Claim 19. 
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22. An antibody which specifically binds to the polypeptide of Claim 20. 

23 . An isolated nucleic acid molecule consisting of a nucleotide sequence selected 
from the group consisting of: 

a) SEQ ID NOS: 21-66; and 
5 b) the complement of SEQ ID NOS : 2 1 -66. 

24. A method of diagnosing or aiding in the diagnosis of neurodegenerative disease in 
an individual comprising 

a) obtaining a nucleic acid sample from the individual; and 

b) determining the nucleotide present at nucleotide position 5254 of SEQ ID 
10 NO: 1, 

wherein presence of a thymine at said position is indicative of increased 
likelihood of neurodegenerative disease in the individual as compared with an 
individual having a cytosine at said position. 

25. The method of Claim 24, wherein said neurodegenerative disease comprises one 
15 or more symptoms selected from the group consisting of: reduced sensory nerve 

conduction, reduced motor nerve velocity, hypermyelination of retinal nerve 
fibers, atrophy of upper cerebellar vermis, absence of Purkinje cells and abnormal 
neuronal lipid storage. 

26. The method of Claim 24, wherein the nucleic acid sample is obtained from a 
^0 tissue selected from the group consisting of: bram tissue, CNS, lung, fetal lung, 

testis, lymphocytes, adipose, fibroblasts, skeletal muscle, pancreas, uterus, 
kidney, tonsil, embryo and isolated cells thereof 
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27. The method of Claim 26, wherein said brain tissue is selected from the group 
consisting of cerebral cortex, grauular cell layer of the cerebellum and 
hippocampus. 

28. The method of Claim 24, wherein the neurodegenerative disease is an early onset 
5 neurodegenerative disease. 

29. A method of diagnosing or aiding in the diagnosis of neurodegenerative disease in 
an individual comprising 

a) obtaining a nucleic acid sample from the individual; and 

b) determining whether there is a deletion of a thymine at nucleotide position 
10 6594ofSEQIDNO: 1, 

wherein deletion of a thymine at said position is indicative of increased likelihood 
of neurodegenerative disease in the individual as compared with an individual 
who does not have a deletion at said position. 

30. The method of Claim 29, wherein said neurodegenerative disease comprises one 
1 5 or more symptoms selected from the group consisting of: reduced sensory nerve 

conduction, reduced motor nerve velocity, hypermyehnation of retinal nerve 
fibers, atrophy of upper cerebellar vermis, absence of Purkinje cells and abnormal 
neiuronal lipid storage. 

31. The method of Claim 29, wherein the nucleic acid sample is obtained from a 
10 tissue selected from the group consisting of: brain tissue, CNS, lung, fetal lung, 

testis, lymphocytes, adipose, fibroblasts, skeletal muscle, pancreas, uterus, 
kidney, tonsil, embryo and isolated cells thereof 
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The method of Claim 31, wherein said brain tissue is selected from the group 
consisting of cerebral cortex, granular cell layer of the cerebellum and 
hippocampus. 

The method of Claim 29, wherein the neurodegenerative disease is an early onset 
neurodegenerative disease. 

A method of treating a neurodegenerative disorder associated with the presence of 
a thymine at nucleotide position 5254 of SEQ ID NO: I in an individual, 
comprising administering to the individual an agent selected from the group 
consisting of: 

a) a polypeptide encoded by SEQ ID NO: 2 or an active portion thereof; 

b) a nucleic acid molecule which encodes SEQ ID NO: 2 or an active portion 
ofSEQIDNO: 2; and 

c) an agonist of SEQ ID NO: 2. 

A method of freating a neurodegenerative disorder associated with a deletion at 
nucleotide position 6594 of SEQ ID NO: 1 in an individual, comprising 
administering to the individual an agent selected from the group consisting of: 

a) a polypeptide encoded by SEQ ID NO: 2 or an active portion thereof; 

b) a nucleic acid molecule which encodes SEQ ID NO: 2 or an active portion 
of SEQ ID NO: 2; and 

c) an agonist of SEQ ID NO: 2. 

A method of diagnosing or aiding in the diagnosis of neurodegenerative disease 
associated with the presence of a thymine at nucleotide position 5254 of SEQ ID 
NO: 1 in an individual, comprising: 

a) obtaining a sample comprising a Spastin polypeptide from the individual; 

b) determining the size of the Spastin polypeptide, 
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wherein if the Spastin polypeptide is significantly shorter than SEQ ID NO: 2 it is 
indicative of neurodegenerative disease. 

A method according to Claim 36, wherein the Spastin polypeptide is significantly 
shorter than SEQ ID NO: 2 if the Spastin polypeptide comprises less than about 
75% of the amino acids of SEQ ID NO: 2. 

A method of diagnosing or aiding in the diagnosis of neurodegenerative disease 
associated with the presence of a deletion at nucleotide position 6594 of SEQ ID 
NO: 1 in an individual, comprising: 

a) obtaining a sample comprising a Spastin polypeptide from the individual; 

b) determining the size of the Spastin polypeptide, 

wherein if the Spastin polypeptide is significantly shorter than SEQ ID NO: 2 it is 
indicative of neurodegenerative disease. 

A method according to Claim 36, wherein the Spastin polypeptide is significantly 
shorter than SEQ ID NO: 2 if the Spastin polypeptide comprises less than about 
75% of the amino acids of SEQ ID NO: 2. 
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IDENTIFICATION OF ARSACS MUTATIONS AND METHODS OF USE 

THEREFOR 

ABSTRACT OF THE DISCLOSURE 
Isolated spastin genes and fragments thereof, as well as Spastin proteins and 
5 fragments thereof are disclosed. Also disclosed are altered forms of spastin, as well as 
methods for the diagnosis and treatment of neurodegenerative disease. 
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(SEQ ID NO: 2) ^^NTFW?GRELIvc:^/Y?FaE:^^R.^^r:?svswL :or.^w7^^^^ 60 
(SEQ ID NO: 4) s,dkr...l l.m. 

GQTCVHLXHLRIPSLVILDDESEAQLPErLADIVQKLGGF^.-LKrCLDASIQKPlIKKYIHS 120 
0 V T r...R.,T V 

PLPSAV^QIMEKMPLQKLCNQITSLLPTHKDALRKFLASLTDSSEKcKRIIQ'ELAIFKRI 180 
I I A T T 

NKSSDQGISSYTKLKGCKVLHKTAKLPADLRLSISVIDSSDEATIRLAN]^ 240 
D T V K 

CLKL^/LKDIENAFYSKEEVTQLML^mvE^IL3SLKNENFNV^lEiVLTPLKFI 300 
. - - F G. . . .TQ I S. . .D. .M HM- .GHV.A. 

GELFDPDIEVLKDLFCNEEGTYFPPSVFTSPDILHSLRQIGLKNEASLKEKDV^/QVAKKI 360 
.D R...Y.. .EAC. . .TI S R_ 

EALQVGACPDQDVLLKKAKTLLLVLNKNKTLLQSSEGKJyiTLKKIKWPA^ 420 
SS .QM. . . .M Q A 

LVWKGDLCNLCAPPDMCDVGHAXLIGSSLPLVESIKVNLEKALGIFTKPSL3AVT.KK 480 
AA, .V.V V Q. .S TIN T 

VVDWYSSKTFSDEDYYQFQHILLEIYGFMKDHLNEGKDSFHALKFPWVXVTGKKFCPLAQA 540 
T S K N 

VIKPIHDLDLQPYLHNVPKTMAKFHQLFK^/CGSIEELTSDHISMVIQKIYLKSDQDLSEQ 600 
T Y A V E. . ,E 

ESKQNI.HL]yiLNIIRWLYSNQI?ASPNTPV?IHKSKNPSKLIMK?IKECCYCD::<^/DDL^ 660 
M Y. .R V 



. L .'..^..M.. , E. 



LLEDSVEPIILVHEDI?MKTAEWLKVPC::STRLINPENM3FEQSGQHEPLTVRIK^ 



720 



780 



840 



HIKDKSNPGIKINWSKQQKRLRKFPNQFKPFIDVFGCQLPLTVEAPYSYNGTLFRLSFRT 900 
R A 

QQEAK^SEVSSTCYNTADIYSLVDEFSLCGHRLIIFTQSVICSI^YLKYLKIEETNPSLAQD 960 
N 

WIIKKKSCSSKALNTPVLSVLKEAAKLMKTCSSSNKKLPSDEPKSSCILQir^yTlEFHHV 1020 
.1 V.P A T.V 

FRRIADLQSPLFRGPDDDPAALFEMAKSGQSKK?SDELSQKTVECTTWLLCTC:4DTGEAL 1080 
T P. . . .D I 

KFSLSESGRRLGLVPCGAVGVQLSEIQDQJCvVTVKPHIGE^/FCYLPLRIKTGLPVlilMGCF 1140 
. . . .N L.H.T.E r 

AVTSNRKEIWKTDTKGRWNTTFMRHVIVKAYLQ VLSVLRDLAT SGELJMDYTYYAVWPDPD 1200 
A IG. . .T 

LVHDDFSVICQGFYEDIAHGKGKSLTKVFSDGSTWS:-lK>n/T^FLDDSILKRRDVGSAAFK 1260 
K R M Q.K 

IFLKYLKKTGSKHLCAVELPSSVKLGFESAGCKQILLENTFSEKQFFSS^/FFPNIQEIEA 1320 
A 

ELRDPLMIFVLNEKVDEFSGVLRVTPCIPCSLEGKPLVLPSRLIHPEGRVAKLFDIKDGR 1330 
N L I V T 

FPYGSTQDYLNPIILi:<LVQLGKAKDDILWDDMLERAVSVAEINKSDKVAAC::.RSSILLS 1440 
M E A 



Figure 5A 



LI0EK1KIRD?RAKDFAAKYQTIR?L?FLTK?AG?SLDWKGNSFK?ETMFAATDLYTAEH 1500 
K ? E I....Y 

QDIVCLLQPILNENSHSFRGCGSVSLAVKEFLGLLKKP'T/ZDLVINQLKE^/AKSVDCGITL 1560 

Q 

YQENITNACYKYLHEALMQ^^EITKMSIIDKLK?FSFXLVENAYVDSEKVSFKLNFEAAPY 1620 
VL. . .MA. AT. .E C V. .E 

LYQLPNKYKNNFRELFETVGVRQSCTVEDFALVLESIDQERGTKQITEENFQLCRKIISE 1680 
S F K 



GIWSLIREKKQEFCEKNYGKILLPDTl^jynLLPAKSLCYmCPWIKVKDTTVKYCHADIPR 1740 
R L S 



EVAVKLGAVPKRHKALERYASNVCf 
I I. . 








ATSISPGRMFRDL 1 
, , .V 



DADFRTQFSDVLDLYLGTHFKLDNCTMFRFPLRNAEMAKVSEISSVPASDRMVQNLLDKL 1980 
N Q S 



RSDGAELLMFLNHMEKISICSIDKSTGALWVLYSVKGKITDGDRLKRKQFHASVIDSVTK 2040 
A. .G 

KKQLKDIPVQQITYTMDTSDSSGNLTTWLICNRSGFSSMEKVSKSVISAHKNQDITLFPR 2100 



GGVAACITHNYKKPHRAFCFLPLSLETGLPFHVNGHFALDSAKKNLWRDCNGVGVRSDWN 2160 



NSLJyrTALXAPAYVSLLIQLKKRYFPGSDPTLSVLQNTPIHVVKDTLKKFLSFFPVNm 2220 



QPDLYCLVKALYNCIHEDJyiKRLLPWRAPNIDGSDLHSAVIITWINMSTSNKTRPFFDNL 2280 
S 

LQDELQHLKNADYNITTRKWAENVYRLKIiLLLEIGFNLVYNCDETANLYHCLIDADIPV 2340 
V 

SYVTPADIRSFLMTFSSPDTNCHIGKLPCRLQQTNLKLFHSLKLLVDYCFKDAEENEIEV 2400 
V S.F. . 



.1. .G. 



EGL?LLITLDSVLQTFDAKRPKFLTTYHELIPSRKDLFMNTLYLKYSNILL^CKVAKVFD 2460 



.SV. 



ISSFADLLSSVLPREYKTKSCTKWKDNFASESWLKNAWHFXSESVSVKEDQEETKPTFDI 2520 
N.A TD P. .A. .V 

VVI)TLKDWALLPGTKFWSANQLWPEGDVLLPLSLMHIAVF?NAQSDK^/FHAI^^ 2580 
I . . I TS I 

QLALNKICSKDSAFVPLLSCHTANIESPTSILKALHYWQTSTFRAEKLVEl^FEALIi^ 2640 
L D. .A V T. . .M 

FNCNLNHIJxrSQDDIKILKSLPCYKSISGRYVSIGKFGTCYVLTKSIPSAEVEIGVTQSSSS 2700 
S M. .A 

AFLSEKIHLKELYEVIGCVPVDDLEVYLKHLLPKIENLSYDAKLEHLIYLKNi^ SSAEEL 2760 
V L A.I. .P 

SEIKEOLFEKLSSLLIIH DANSRLKOAKHFYDRTVRVFEVMLPEKLFIPNDFFKKLSQLI 2820 
.N KS V. 

KPKNHVTFMTSWVEFLRNIGLXYILSQQQLLQFAKSISVHANTENWSKETLQimrDIL^^ 2880 
. . . .QAA A S 
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HI?QERKDLLSGNFLKELSLI?FLC?£KA?A£FIRFH?QYQE\,''NGTL?LIKFNGACV'N?K 2940 
Y 

FKQCDVLQLLWTSCPILPEKATPLSIKECEGSDLGPOEQLEQVLiNTMLNVTJLDPPLDKVI^ 300 0 
A 

NCRNICNITTLDEEM\/'KTRAKVLRSI YEFLSAEKREFRFQLRGVAF^y-MVEDGWKLLKPEE 3 060 



WINLEYESDFKPYLYKLPLELGTFHQLFKHLGTEDIISTKQWEVLSRIFKNSEGKQLD 3120 
A S 

PNEMRTVKHWSGLFRSLQOTSVKVRSDLEIWRDLAXYLPSQDGRLVKSSrLV 3180 
K A K 

KSRIQGNIGVQMLVDLSQCYLGKDHGFHTKLIMLFPQKLRPRLLSSILEEQLDEETPKVC 3240 



QFGALCSLQGRLQLLLSSEQFITGLIRIMKHENDNAFLANEEKAIRLCKALREGLKVSCF 3 300 



EKLQTTLRTOGFNFIPHSRSETFAFLKRFGNAVILLYIQHSDSKDINF 



XALAKTLKSAT 3 360 



DNLI3DTSYl|rAiyiLGCNDIYRIGEKLDSLGVlCYDSSEPSKL£L?MFGTPIPAEIHYTLLM 3420 
S 



DFMNVFYPGSY^/GYLVDAEGGDI YGS YQ PTYTYAI IVQ E'^/'EREDADNSSFLGKI YQ IDIG 3480 
T 

YSEYKIVSSLaLYKFSR?EESSQSRDSA?STPTSPTSFLT?GLRSI??LFSGRESKKT-S 3540 
D. . . .N T K. . . .SP. 



skkqspkklkvnslpeilkevtsweqawk: 

T. .H. .R A 





3600 



TSASRFQSDKYSFQRFYT SWNOEATSHK 3660 



SEROCONKEK CIPPSAGQTYSQRFFVPPTFKSVGNPVEARRWLRQARANFSAARNDLHKNA 3720 



NEWVCFKCYLSTKLALXAADYAVKGKSDKDVK? TAL AO KI EE YSOOL SGLTNDVHTL SAY 3780 



GVDSLKTRYPDLLPF?QIPNDRFTSEV*AMRVMECTACIIIKL£NFMQQKV 3 83 0 
I 
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LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



AF193557 11493 bp DNA ROD 

Mus musculus sacsin gene, complete cds . 
AF193557 

AF193557. 1 GI: 6907 043 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 11493) 

Engert,J.C., Berube,P., Mercier^J., Dore,C., Lepage, P., Ge,B. , 
Bouchard, J. P. , Mathieu,J., Melancon, S . B . , Schalling, M . , 
Lander, E.S., Morgan, K., Hudson, T.J. and Richter,A. 
ARSACS, a spastic ataxia common in northeastern Quebec, is caused 
by mutations in a new gene encoding an 11.5-kb ORF 
Nat. Genet. 24 (2), 120-125 (2000) 
20120709 

2 (bases 1 to 11493) 

Engert,J.C., Berube,P., Dore,C., Lepage, P., Ge,B,, Hudson, T.J. and 
Richter, A. 
Direct Submission 

Submitted (08-OCT-1999) Genome Centre, Montreal General Hospital, 
1650 Cedar Ave., Montreal, QC H3G 1A4 , Canada 
Location/Qualifiers 
1. . 11493 

/organism="Mus musculus" 
/db_xref ="taxon: 10090" 
<1. .>11493 
/product =" sacsin" 

1.. 11493 

/note= "molecular chaperone" 
/codon__start=l 
/product="sacsin" 
/protein_id= " AAF31263 .1 " 
/db_xref ="GI : 6907044 " 

/ translation= "MNTFWPGRELWQWYPFSEDKRHPSLSWLKMVWKNLYIHFSEDL 

TLFDEMPLIPRTLLNEDQTCVELIRLRIPSWILDDETEAQLPEFLADIVQKLGGIVL 

KRLDTSIQHPLVKKYIHSPLPSAILQIMEKIPLQKLCNQIASLLPTHKDALRKFLASL 

TDTSEKEKRIIQELTIFKRINHSSDQGISSYTKLKGCKVLDHTAKLPTDLRLSVSVID 

SSDEATIRLANMLKIEKLKTTSCLKFVLKDIGNAFYTQEEVTQLMLWILENLSSLKNE 

NSNVLDWLMPLKFIHMSQGHWAAGDLFDPDIEVLRDLFYNEEEACFPPTIFTSPDIL 

HSLRQIGLKNESSLKEKDWQVARKIEALQVSSCQNQDVLMKKAKTLLLVLNKNQTLL 

QSSEGKMALKKIKWVPACKERPPNYPGSLVWKGDLCNLCAPPDMCDAAHAVLVGSSLP 

LVESVHVNLEQALSIFTKPTINAVLKHFKTWDWYTSKTFSDEDYYQFQHILLEIYGF 

MHDHLSEGKDSFKALKFPWVWTGKNFCPLAQAVIKPTHDLDLQPYLYNVPKTMAKFHQ 

LFKACGSIEELTSDHISMVIQKVYLKSDQELSEEESKQNLHLMLNIMRWLYSNQIPAS 
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TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
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source 



mRNA 
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Figure 8A 



PNTPVPIYHSRNPSKLVMKPIHECCYCDIKVDDLNDLLEDSVEPIILVHEDIPMKTAE 

WLKVPCLSTRLINPENMGFEQSGQREPLTVRIKNILEEYPSVSDIFKELLQNADDANA 

TECSFMIDMRRNMDIRENLLDPGMAACHGPALWSFNNSEFSDSDFLNITRLGESLKRG 

EVDKVGKFGLGFNSVYHITDIPIIMSREFMIMFDPNINHISKHIKDRSMPGIKIMWSK 

QQKRLRKFPNQFKPFIDVFGCQLPLAV^APYSYl^GTLFRLSFRTQQEAKVSEVSSTCY 

NTADIYSLVDEFSLCGHRLIIFTQSVNSMYLKYLKIEETNPSLAQDTIIIKKKVCPSK 

ALNAPVLSVLKEAAKLMKTCSSSNKKLPTDVPKSSCILQITVEEFHHVFRRIADLQSP 

LFRGPDDDPATLFEMAKSGQSKKPSDELPQKTVDCTTWLICTCMDTGEALKFSLNESG 

RRLGLVPCGAVGVLLHETQEQKWTVKPHIGEVFCYLPLRIKTGLPIHINGCFAVTSNR 

KEIWKTDTKGRWNTTFMRHVIVKAYLQALSVLRDLAIGGELTDYTYYAVWPDPDLVHD 

DFSVICKGFYEDIAHGKGKELTRVFSDGSMWVSMKNVRFLDDSILQRKDVGSAAFKIF 

LKYLKKTGSKNLCAVELPSSVKAGFEEAGCKQILLENTFSEKQFFSEVFFPNIQEIEA 

ELRDPLMNFVLNEKLDEFSGILRVTPCVPCSLEGHPLVLPSRLIHPEGRVAKLFDTKD 

GRFP YGS TQD YLNP 1 1 L I KL VQLGMAKDD IL WDDMLERAE S VAE INKSDHAAACLRS S 

ILLSLIDEKLKIKDPRAKDFAAKYQTIPFLPFLTKPAGFSLEWKGNSFKPETMFAATD 

lYTAEYQDIVCLLQPILNENSHSFRGCGSVSLAVKEFLGLLKKPTVDLVINQLKQVAK 

SVDDGITLYQENITNACYKYLHEAVLQNEMAKATIIEKLKPFCFILVENVYVESEKVS 

FHLNFEAAPYLYQLPNKYKNNFRELFESVGVRQSFTVEDFALVLESIDQERGKKQITE 

EMFQLCRRIISEGIWSLIREKRQEFCEKNYGKILLPDTNLLLLPAKSLCYNDCPWIKV 

KDSTVKYCHADIPREVAVKLGAIPKRHKALERYASMICFTALGTEFGQKEKLTSRIKS 

ILNAYPSEKEMLKELLQNADDAKATEICFVFDPRQHPVDRIFDDKWAPLQGPALCVYN 

NQPFTEDDVRGIQNLGKGTKEGNPCKTGHYGIGFNSVYHITDCPSFISGNDILGIFDP 

HARYAPGATSVSPGRMFRDLDADFRTQFSDVLDLYLGNHFKLDNCTMFRFPLRNAEMA 

QVSE I S S VPS SDRMVQNLLDKLRSDGAELLMFLNHMEKI S I CE IDKATGGLNVL YS VK 

GKITDGDRLKRKQFHASVIDSVTKKRQLKDIPVQQITYTMDTEDSEGNLTTWLICNRS 

GFSSMEKVSKSVISAHKNQDITLFPRGGVAACITHNYKKPHRAFCFLPLSLETGLPFH 

VNGHFALDSARRNLWRDDNGVGVRSDWNNSLMTALIAPAYVELLIQLKKRYFPGSDPT 

LSVLQNTPIHWKDTLKKFLSFFPVNRLDLQPDLYCLVKALYSCIHEDMKRLLPWRA 

Figure 8B 



PNIDGSDLHSAVIITWINMSTSNKTRPFFDNLLQDELQHLKNADYNITTRKTVAENVY 

RLKHLLLEIGFNLVYNCDETANLYHCLVDADIPVSYVTPADVRSFLMTFSSPDTNCHI 

GKLPCRLQQTNLKLFHSLKLLVDYCFKDAEESEFEVEGLPLLITLDSVLQIFDGKRPK 

FLTTYHELIPSRKDLFMNTLYLKYSSVLLNCKVAKVFDISSFADLLSSVLPREYKTKN 

CAKWKDNFASESWLKNAWHFISESVSVTDDQEEPKPAFDVIVDILKDWALLPGTKFTV 

STSQLWPEGDVLIPLSLMHIAVFPNAQSDKVFHALMKAGCIQLALNKICSKDSALVP 

LLSCHTANIDSPASILKAVHYMVQTSTFRTEKLMENDFEALLMYFNCNLSHLMSQDDI 

KILKS LPC YKS I S GRYMS I AKFGTC YVLTKS I PS AEVEKWTQS S S S AFLEEKVHLKEL 

YEVLGCVPVDDLEVYLKHLLPKIENLSYDAKLEHLIYLKNRLASIEEPSEIKEQLFEK 

LESLLIIHDAmRLKQAKHFYDRTWVFEVMLPEKLFIPKEFFKKLEQVIKPKNQAAF 

MTSWVEFLRNIGLKYALSQQQLLQFAKEISVRANTENWSKETLQSTVDILLHHIFQER 

MDLLSGNFLKELSLIPFLCPERAPAEYIRFHPQYQEVNGTLPLIKFNGAQVNPKFKQC 

DVLQLLWTSCPILPEKATPLSIKEQEGSDLAPQEQLEQVLNMLNVNLDPPLDKVINNC 

RNICNITTLDEEWKTRAKVLRSIYEFLSAEKREFRFQLRGVAFVMVEDGWKLLKPEE 

WINLEYEADFKPYLYKLPLELGTFHQLFKHLGTEDIISTKQYVEVLSRIFKSSEGKQ 

LDPNEMRTVKRWSGLFKSLQNDSVKVRSDLENARDLALYLPSQDGKLVKSSILVFDD 

APHYKSRIQGNIGVQMLVDLSQCYLGKDHGFHTKLIMLFPQKLRPRLLSSILEEQLDE 

ETPKVCQFGALCSLQGRLQLLLSSEQFITGLIRIMKHENDNAFLANEEKAIRLCKALR 

EGLKVSCFEKLQTTLRVKGFMPIPHSRSETFAFLKRFGNAVILLYIQHSDSKDINFLL 

ALAMTLKSATDNLISDTSYLIAMLGCNDIYRISEKLDSLGVKYDSSEPSKLELPMPGT 

PIPAEIHYTLLMDPMNVFYPGEYVGYLVDAEGGDIYGSYQPTYTYAIIVQEVEREDAD 

NTSFLGKIYQIDIGYSEYKIVSSLDLYKFSRPDESSQNRDSAPTTPTSPTEFLTPGLR 

SIPPLFSGKESHKSPSTKHHSPRKLKVNALPEILKEVTSVVEQAWKLPESERKKIIRR 

LYLKWHPDKNPENHDIANEVFKHLQNEINRLEKQAFLDQNADRASRRTFSTSASRFQS 

DKYSFQRFYTSWNQEATSHKSERQQQSKEKCPPSAGQTYSQRFFVPPTFKSVGNPVEA 

RRWLRQARANFSAARNDLHKNANEWVCFKCYLiSTKLALIAADYAVRGKSDKDVKPTAL 

AQKIEEYSQQLEGLTNDVHTLEAYGVDSLKTRYPDLLPFPQIPNDRFTSEVAMRVMEC 
TAG III KLEMF I QQ KV " 



Figure 8C 



BASE COUNT 3599 a 2281 c 2387 g 3226 1 

ORIGIN 

1 atgaatacat tctggcctgg tcgagagttg gtggttcagt ggtatccatt tagtgaagac 
61 aaacgtcacc catccctttc atggcttaag atggtttgga agaatctcta tatacatttc 
121 tcggaagatt tgactttatt tgatgagatg ccacttatcc ctagaactct actgaatgag 
181 gaccagacgt gtgtggaact catcagactc aggatcccat cagtagtcat tttagatgat 
241 gaaactgaag ctcagcttcc agaattctta gcagatattg tacaaaaact tggagggatt 
301 gtcctgaaaa gactagatac ctctattcag catccacttg ttaaaaaata cattcattcc 
361 ccactcccga gtgctatttt gcagataatg gagaagatac ctctacagaa gttgtgtaat 
421 caaatagcat cattacttcc aacccacaaa gatgctctaa ggaagttttt ggccagctta 
481 actgatacca gtgaaaaaga gaaaagaata attcaagaat tgacaatatt caaaagaatt 
541 aatcactcat cagatcaagg gatttcctct tacacaaaat taaaaggatg taaagttttg 
601 gatcataccg ccaagcttcc aacagatcta cggctatcag tttcagtaat agatagtagt 
661 gatgaagcca ccattcgttt ggcaaacatg ttgaaaattg aaaaattgaa gactacaagc 
721 tgtttaaagt ttgttttaaa agatattgga aatgcatttt atacacagga agaggtaaca 
781 caacttatgc tttggatcct tgagaatcta tcctctctta aaaatgagaa ttcaaatgtg 
841 cttgattggt taatgccact aaaattcatt catatgtccc agggacatgt ggtagcagct 
901 ggtgatctct ttgatcctga tatagaagta ctaagggatc tcttttataa tgaagaagaa 
961 gcttgtttcc cacctacaat ttttacctca ccagatatcc ttcactcttt gagacagatt 
1021 ggcttaaaaa atgaatccag tctaaaagaa aaagatgttg tacaagtggc aagaaaaatt 
1081 gaagctttac aggtcagttc ctgtcagaat caggatgttc tcatgaagaa agccaaaaca 
1141 ctcttactgg tcttgaataa aaaccagaca ctcttgcagt cttctgaagg gaagatggca 
12 01 ttgaagaaaa tcaaatgggt tccagcctgc aaggaaagac ctccaaatta tcccggttcc 

12 61 ttagtctgga aaggggatct ctgtaatctt tgtgcacctc cagatatgtg tgatgcggca 
1321 catgcagttc tagtaggctc ctcacttcct cttgttgaaa gtgtccatgt gaacctggag 

13 81 caggcgctca gcatcttcac aaagcctact atcaatgctg tcttaaaaca ctttaaaact 
1441 gttgttgact ggtatacttc aaaaaccttt agtgatgaag attactatca gttccaacat 
1501 attttgcttg aaatttatgg gttcatgcat gatcatctga gtgaagggaa ggattctttt 
1561 aaagccttga agtttccatg ggtttggact ggcaaaaact tttgtcctct tgcccaggct 
1621 gtgataaagc caacccatga tctggatctt cagccttatt tatataatgt gcctaaaacc 
1681 atggcaaaat tccaccagct gttcaaggct tgtggctcaa tagaagagtt gacatcagat 
1741 catatttcca tggtcattca gaaagtttat ctcaaaagtg accaggagtt gagtgaagaa 
1801 gaaagtaaac aaaatcttca tctcatgttg aatattatga gatggctcta tagcaatcag 
1861 attccagcaa gccctaatac accagttcct atttatcaca gcagaaatcc ttccaaactt 
1921 gtcatgaagc caattcatga atgctgttat tgtgacatca aagttgatga cctcaatgac 
1981 ttgcttgaag attcagtgga accaattatc ttggtacatg aagatatacc catgaaaact 
2 041 gcagaatggc taaaagttcc gtgccttagt acaagactga tcaatcctga aaacatgggg 
2101 tttgagcagt cagggcaaag agagcctctt actgtaagga ttaaaaatat tttggaagaa 
2161 tacccttccg tgtcagatat ttttaaagag ctacttcaaa atgctgatga tgcaaatgcc 
2221 acagaatgca gcttcatgat tgatatgaga aggaatatgg acatacggga aaatctcctg 
2281 gacccaggga tggcagcttg tcatggacct gctctgtggt cattcaacaa ttctgaattc 
2341 tcagattcag atttcttaaa cataacgagg ttaggagagt ctttaaaaag gggagaagtt 
2401 gacaaggttg ggaaatttgg tcttggtttt aattctgtgt accacatcac tgacattccc 
2461 atcattatga gcagagaatt tatgataatg tttgatccaa acataaatca tatcagcaaa 
2521 cacattaaag atagatcgaa tcctggaatc aaaattaatt ggagtaagca gcagaaaaga 
2581 cttaggaagt tccccaacca gttcaaacca tttatagatg tatttggctg tcagttacct 
2641 ttggctgttg aagctcctta cagctacaat ggaactcttt tccgactgtc ctttagaaca 
2701 cagcaggaag caaaagtgag tgaagttagc agtacttgct acaatactgc ggatatttac 
2761 tccctagtgg atgaatttag tctttgtggg cacagactta tcatttttac tcagagtgta 
2 821 aactcgatgt atttgaaata cttgaaaatt gaagaaacca atcctagctt agcacaagat 
2 881 acaatcataa ttaagaaaaa agtttgcccc tccaaagcat tgaatgcacc agttttaagt 

2 941 gttttaaaag aagctgctaa actcatgaag acttgtagca gcagcaacaa gaagcttccc 

3 0 01 acggatgtgc caaagtcatc ttgcattctt cagatcacag tcgaagaatt ccaccatgtg 
3061 tttaggagga ttgctgactt acagtcacca ctatttcgag gtccagatga tgacccagct 
3121 actctctttg aaatggctaa atctggccaa tcaaaaaagc catcagatga gttgccacaa 
3181 aagacagtag attgtaccac atggcttata tgcacatgca tggatacagg agaagctctc 
3241 aagttttcct tgaatgaaag tggaagaaga ttagggctgg ttccttgtgg ggcagtaggg 
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3301 gttctcttgc atgaaaccca ggaacagaag tggaccgtga aaccacacat aggagaagtg 
3361 ttttgctatt tacctctacg aatcaaaaca gggttgccaa ttcacatcaa tgggtgcttt 
3421 gctgttactt caaataggaa agaaatctgg aagacagata caaaaggtcg atggaatacc 
3481 acattcatga ggcatgtcat tgtgaaagct tacttacaag ccctcagtgt cttacgggac 
3541 ctagccattg gtggtgagct gactgattat acttactatg cagtgtggcc tgatcctgat 
3 601 ctagttcatg atgacttctc tgtgatctgt aaaggatttt atgaagacat tgctcatggg 
3 661 aaggggaagg agttgaccag agtcttctct gatgggtcta tgtgggtttc catgaagaat 
3721 gtgaggtttc tggatgactc tatacttcaa aggaaagatg ttggttcagc agccttcaag 
3781 atatttctga agtacctcaa gaaaacagga tccaaaaacc tctgtgctgt tgagcttcct 
3 841 tcttcagtaa aagcaggatt tgaagaggct ggctgtaaac agatactgct ggaaaataca 
3901 ttttcagaga aacagttctt ttcagaagtc ttctttccta atatccagga aattgaagca 
3961 gaacttagag atcctctgat gaattttgtc ctaaatgaaa aacttgatga gttctcagga 
4021 attcttcgtg ttaccccttg tgttccttgc tccttggagg gccatccttt ggttttgcct 
4081 tcaagattga tccatcctga aggacgagtt gcaaagttat ttgatactaa agatggaaga 
4141 ttcccttatg gttccacaca ggattacctc aatcctatta tcttgattaa gctcgttcag 
42 01 ttaggcatgg caaaagatga tattttgtgg gatgacatgc tagagcgtgc agagtctgta 
4261 gctgagatta ataaaagtga ccatgctgct gcctgcttaa ggagtagtat tctgctaagc 
4321 cttattgatg agaagctaaa aataaaggat cctagagcaa aggattttgc tgcaaaatat 
4381 caaacaattc ccttcctccc atttctaaca aagccagcag gtttttcttt agaatggaaa 
4441 gggaacagct ttaagcctga aaccatgttt gcagcaactg acatttacac agctgaatat 
4501 caagatatag tctgtctttt gcaaccaatt cttaatgaaa attcccattc ctttagaggc 
4561 tgtggttcag tgtctttggc tgttaaggag tttttgggtt tactaaagaa gccaacagtt 
4621 gatctggtaa taaaccagtt gaagcaagtt gcaaaatcag ttgatgatgg cattacattg 
4681 taccaggaaa atatcaccaa cgcttgctac aaatacctcc atgaagcagt attgcagaat 
4741 gaaatggcca aggcaacaat tattgagaag ctaaagccat tttgtttcat tctagttgag 
4801 aatgtatatg ttgagtcaga aaaggtttct tttcacttga actttgaagc agcaccatac 
4861 ctttatcagt tacctaacaa gtataaaaat aatttccgtg agctttttga aagtgtgggt 
4921 gtgcgacagt catttactgt tgaagacttt gccctagttt tggagtctat tgatcaagag 
4981 agaggaaaaa aacaaataac agaagagaat tttcagcttt gccgacgaat aatcagtgaa 
5041 ggcatctgga gtctcattag agaaaagaga caagaatttt gtgagaaaaa ttatggcaaa 
5101 atattactgc cagacactaa cctgctgctg ctccctgcta agtcattatg ctacaatgac 
5161 tgtccctgga taaaagtaaa ggactccact gtcaagtatt gccatgccga cataccccgg 
5221 gaagtagctg taaaacttgg tgcaatacca aagagacata aagcattaga aagatatgca 
52 81 tccaacatct gtttcacagc tctaggtaca gaatttgggc agaaagaaaa actgaccagc 
5341 agaattaaga gcattctcaa tgcctatcct tcagaaaagg aaatgctgaa agagcttctt 
5401 caaaatgctg atgatgcaaa ggccacagag atctgctttg tgtttgatcc tagacagcat 
5461 cctgttgacc gaatatttga tgataagtgg gccccactgc aagggccagc actgtgtgtt 
5521 tacaacaacc agccatttac agaagatgat gttagaggaa ttcagaatct tgggaaaggc 
5581 accaaagaag ggaatccttg caaaacagga cattatggaa tcggattcaa ttccgtttat 
5641 catattacag actgcccttc ttttatttct ggcaatgaca tcctgggtat ttttgatccc 
5701 catgccagat atgcaccagg agccacatca gttagccctg gacgcatgtt tagagatttg 
5761 gatgcagact ttagaaccca gttctcagat gttctagatc tgtacttggg aaaccacttt 
5 821 aaactggaca attgtacaat gtttagattt cctctgcgta atgcagagat ggcacaagtt 
5881 tcagaaattt cttccgttcc atcatcagac agaatggtcc agaatctttt ggacaagtta 
5 941 cggtctgatg gggcagaact tctaatgttt ctcaaccaca tggagaaaat atctatttgt 
6001 gaaatagata aggccacagg aggtctgaat gtgctctatt cagtaaaagg caagatcact 
6061 gatggagacc gattgaaaag gaagcaattc cacgcctctg taattgacag tgttactaaa 
6121 aagagacagc tcaaggacat accagttcaa caaataacct acactatgga tactgaggat 
6181 tctgaaggaa atctgaccac atggctcatc tgtaatagat caggattttc aagtatggaa 
6241 aaagtatcca agagtgtaat atcagctcac aagaaccaag atatcaccct tttcccacgt 
6301 ggtggagtag cagcctgcat tactcacaat tataaaaagc cccacagagc cttctgcttt 
6361 ctgcctctct ctttggagac agggctgcca tttcatgtga atggccactt tgctctagat 
6421 tcagccagaa gaaacttgtg gcgtgatgat aatggggttg gtgttcgaag tgactggaat 
6481 aatagtttaa tgacagcatt aatagcacct gcatatgttg agttactaat ccagttaaaa 
6541 aaacggtatt tccctggttc tgacccaaca ttatcagttt tacagaacac acccattcat 
6601 gtcgtaaagg acacattaaa gaagtttctg tccttctttc cagttaacag gctggatctg 
6661 cagccggact tatattgctt agtaaaagca ctttacagtt gcattcatga agacatgaag 
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6721 cgtcttttgc ctgttgttcg ggctccaaat attgatggct cagatttgca ctctgcagtc 
6781 ataattactt ggatcaatat gtctacttca aataaaacta gaccattttt tgataactta 
6841 ctacaggatg aattacagca ccttaaaaat gcagattata acatcacaac tcgaaaaaca 
6901 gtcgcagaga atgtctacag actgaagcac ctgctcttag aaattggttt caacttggtt 
6961 tataactgtg atgaaactgc taacctttac cattgccttg tagatgcaga tatccctgtc 
7021 agctatgtga ctcctgctga tgttaggtcc ttcttaatga ctttctcttc tcctgacact 
7081 aattgccata ttgggaagct gccttgtcgt cttcagcaga ctaacctaaa actttttcac 
7141 agtttaaaac ttttagttga ttactgtttt aaagatgctg aagaaagtga gtttgaagtt 
7201 gagggactgc ccctactcat tacactggac agtgtcttgc agatttttga tggtaaacga 
7261 cccaagtttc taacaacata ccatgaatta attccatcgc gtaaagactt gtttatgaac 
7321 accttatact tgaaatacag tagtgttttg ttgaactgca aagttgcaaa agtgtttgac 
7381 atttccagct ttgctgactt actctcttct gtgttgcctc gtgagtacaa gaccaaaaac 
7441 tgtgcaaagt ggaaagacaa ttttgccagt gaatcttggc ttaagaacgc atggcatttt 
7501 atcagtgaat cagtaagtgt aacggatgat caggaagaac caaagccagc atttgatgtc 
7561 attgttgaca tccttaaaga ctgggcattg cttccaggaa caaagttcac tgtgtcaacc 
7621 agtcagcttg tggttcctga gggagacgtg ttgattcccc tgagcctcat gcacattgct 
7681 gtgttcccaa atgctcagag tgataaggtt tttcacgctc tgatgaaagc tggctgtatt 
7741 cagctggctt tgaacaaaat ctgctctaaa gacagcgcat tagttcctct gttgtcatgc 
7801 cacacagcaa acatagatag ccctgcaagc atcttgaagg ctgtgcatta tatggttcag 
7861 acgtcaacat ttagaactga aaaactaatg gaaaatgact ttgaagcact tttgatgtat 
7921 ttcaactgta atttgagtca cttgatgtcc caagatgaca taaaaatttt aaagtccctc 
7981 ccatgctaca aatccatcag tggccgctat atgagcattg caaaatttgg aacgtgctat 
8041 gtgcttacca aaagtattcc ttcagctgaa gtggaaaaat ggacacagtc atcctcttcc 
8101 gcgtttcttg aagaaaaggt gcatttaaaa gaactctatg aggtgcttgg ctgtgtgcca 
8161 gtagatgatc tggaggtgta tttgaaacat cttctgccaa aaattgaaaa tctctcttat 
8221 gatgcaaagt tggagcacct gatttatctg aagaatagac tggcaagcat cgaggaaccg 
8281 tcagagatta aggagcaact ttttgaaaaa ctggaaagct tattgattat ccacgatgcc 
8341 aacaatcgac taaagcaagc aaaacatttc tatgacagaa ctgtgagagt ttttgaagtt 
8401 atgcttcctg aaaaattgtt tattcctaag gagttcttta aaaaattgga acaagtaatc 
8461 aaacctaaaa atcaagctgc atttatgacg tcctgggtgg aattcttgag aaatattgga 
8521 ctgaagtacg cgctctccca gcagcagttg ttacagtttg ccaaggaaat cagtgtgagg 
8581 gcaaatacag aaaactggtc taaagaaacc ctgcaaagta cagttgacat ccttctccat 
8641 cacatattcc aagaacgaat ggatttgtta tctggaaatt ttctgaaaga actgtcctta 
8701 ataccattct tgtgtcctga acgggccccc gctgagtaca ttcggtttca ccctcagtac 
8761 caggaggtaa acggaacact tcctcttata aagttcaatg gagcacaagt gaatccaaag 
8821 ttcaagcaat gtgatgtact ccagctgctg tggacatctt gccctattct tccagagaaa 
8881 gccacaccgt tgagcattaa agaacaagaa ggcagtgacc tcgctccaca ggaacagctt 
8941 gaacaagttt taaatatgct taatgttaac ctggaccccc ctcttgataa ggtcattaat 
9001 aattgcagaa acatatgcaa cataacaact ttggatgagg aaatggtaaa aactagagca 
9061 aaggtcctaa ggagcatata tgaatttctg agtgcagaaa aacgagagtt ccgttttcag 
9121 cttcggggtg tggcctttgt aatggtagaa gacggatgga aacttctgaa gcctgaggaa 
9181 gtagtgataa acctggagta tgaggctgat tttaaacctt atctgtacaa gctgccttta 
9241 gagcttggca cttttcatca gctgttcaaa catttaggta ctgaagatat catttccact 
93 01 aagcaatatg ttgaagtgtt aagccgaata ttcaaaagct ctgaaggaaa gcagctagac 
9361 cctaatgaaa tgcgtacagt taagagagtg gtttctggcc tattcaagag tctacaaaat 
9421 gattcagtca aggtgaggag tgacctggag aatgcccggg acctcgcact ctaccttcca 
9481 agccaggatg ggaagttggt gaagtcaagc atcttggtgt tcgatgatgc gccacattat 
9541 aaaagtagga tccaggggaa tattggcgtg cagatgctag ttgatcttag ccagtgctac 
9601 ttagggaaag accatggatt tcacactaag ctgataatgc tctttcctca aaagcttcga 
9661 cctcgtctgc tgagcagtat acttgaagag cagcttgatg aggagacccc taaagtgtgc 
9721 cagtttggcg cattgtgctc tcttcaggga agactgcagc ttctcttgtc ttcagagcag 
9781 ttcatcacag gactcattcg aatcatgaag catgaaaatg ataatgcttt cctggccaat 
9841 gaagaaaaag ccataagact ttgcaaagct ctaagagaag ggctgaaagt ttcctgtttt 
9901 gagaagcttc agacaacatt aagggttaaa ggttttaatc ctattcccca tagcaggagt 
9961 gaaactttcg cttttctaaa gcgatttggc aatgcagtca tcttgctcta catccaacat 
10021 tcagacagca aagacattaa ctttctgcta gccttagcga tgacacttaa atcagcaact 
10081 gacaatttga tttctgacac gtcatactta attgctatgc tgggatgcaa tgacatttac 
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10141 aggatcagtg agaagcttga cagtttaggg gtgaaatacg actcctctga gccatcaaaa 
10201 ctggaactcc ccatgcctgg cacaccaata cccgctgaga tccattacac actacttatg 
10261 gatccaatga atgtttttta tcctggggaa tatgttggtt accttgtgga tgctgaaggt 
10321 ggtgatatct atgggtcata ccagccaaca tacacatacg caattattgt gcaagaagtt 
10381 gaaagagaag atgctgacaa tactagtttc ttaggaaaga tctatcagat cgatattggc 
10441 tacagtgaat ataagatagt cagctctctt gatctgtaca agttctcaag gcctgatgaa 
10501 agctcccaaa acagagacag tgctcccacc acaccaacaa gccccaccga attcctgact 
10561 cctggtctga gaagcatccc tcctcttttc tctggcaagg agagccacaa gtctccctcc 
10621 accaaacacc attcccccag aaagctcaag gtgaatgctt taccagaaat cttaaaagaa 
10681 gtgacatcag tggtggagca agcttggaag cttccagaat cagagcggaa aaagatcatt 
10741 agacgcttgt atttgaagtg gcaccctgac aaaaatccag aaaatcatga tattgctaat 
10801 gaagtgttca agcacctgca gaatgaaatc aacagattag aaaaacaggc ttttctggat 
10861 caaaatgcag acagagcttc aagaagaaca ttttcaacct ctgcatctcg atttcagtca 
10921 gacaagtact catttcaaag attttacact tcgtggaatc aagaagccac aagtcataaa 
10981 tctgaaaggc aacagcaaag caaagagaaa tgccctcctt ctgctggaca gacatactct 
11041 caaaggttct ttgttcctcc caccttcaag tcagtgggca atccagtgga agcccggaga 
11101 tggttaagac aagccagagc aaacttctca gctgccagga atgaccttca caaaaatgcc 
11161 aatgaatggg tgtgcttcaa gtgttacctt tccaccaagc tggctttgat tgcagccgac 
11221 tatgctgtca gggggaaatc tgataaagat gtaaagccaa ctgcacttgc acaaaagata 
11281 gaggagtaca gtcagcagct ggaaggactg acaaacgatg tgcacacatt ggaagcttat 
11341 ggtgtagaca gcttgaaaac aaggtaccct gatttgcttc cttttccgca gattcccaat 
11401 gacaggttca catctgaggt tgccatgagg gtgatggaat gcactgcctg tatcatcata 
11461 aaacttgaaa attttataca acagaaggtg tga 
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LOCUS AF193556 12793 bp DNA PRI 

DEFINITION Homo sapiens sacsin (SACS) gene, complete cds . 
ACCESSION AF193556 VERSION AF193556.1 GI : 690704 IKEYWORDS 
SOURCE human. 

ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo/ 

REFERENCE 1 (bases 1 to 12793) AUTHORS Engert,J.C., Berube,P., Mercier,J., 
Dore.C, Lepage, P., Ge,B., Bouchard, J. P . , Mathieu,J., Melancon, S .B . , 
Schalling,M. , Lander, E.S., Morgan, K., Hudson, T.J. and Richter,A. TITLE ARSACS, 
a spastic ataxia common in northeastern Quebec, is caused by mutations in a 
new gene encoding an 11.5-kb ORF JOURNAL Nat. Genet. 24 (2), 120-125 (2000) 
MEDLINE 20120709 



REFERENCE 2 (bases 1 to 12793) AUTHORS Engert,J.C., Berube,P., Dore,C., 
Lepage, P., Ge,B., Hudson, T.J. and Richter,A. TITLE Direct Submission JOURNAL 
Submitted (08-OCT-1999) Genome Centre, Montreal General Hospital, 1650 Cedar 
Ave., Montreal, QC H3G 1A4 , CanadaFEATURES Location/Qualifiers source 1.. 12793 
/organism="Homo sapiens" /db_xref =" taxon : 9606 " /chromosome- " 13 " /map= "between 
D13S232 and D13S292" mRNA 1.. 12793 /gene="SACS" /product^ " sacsin" gene 
1.. 12793 /gene="SACS" CDS 77.. 11566 /gene="SACS" /note= "molecular chaperone" 
/codon__start=l /product="sacsin" /protein_id=:"AAF312 62 . 1" 
/db_xref="GI : 6907042" /translation=" 

MNTFWPGRELIVQWYPFDENRNHPSVSWLKMVWKNLYIHFSEDL 
'fi TLFDEMPLIPRTILEEGQTCVELIRLRIPSLVILDDESEAQLPEFLADIVQKLGGFVL 
;JJ KKLDASIQHPLIKKYIHSPLPSAVLQIMEKMPLQKLCNQITSLLPTHKDALRKFLASL 
TDS S EKEKRI I QELAI FKRINHS SDQGI S S YTKLKGCKVLHHTAKLPADLRLS I S VID 
SSDEATIRLANMLKIEQLKTTSCLKLVLKDIENAFYSHEEVTQLMLWVLENLSSLKNE 

NPNVLEWLTPLKF IQ I SQEQMVS AGELFDPD I E VLKDLFCNEEGTYFPPS VFTS PD IL 

HSLRQIGLKNEASLKEKDWQVAKKIEALQVGACPDQDVLLKKAKTLLLVLNKNHTLL 
□ QSSEGKMTLKKIKWVPACKERPPNYPGSLVWKGDLCNLCAPPDMCDVGHAILIGSSLP 
ni LVESIHVNLEKALGIFTKPSLSAVLKHFKIWDWYSSKTFSDEDYYQFQHILLEIYGF 
Q I MHDHLNEGKDS FRALKFPWVWTGKKFCPLAQAVI KP IHDLDLQP YLHNVPKTMAKFHQ 
n I^FKVCGSIEELTSDHISMVIQKIYLKSDQDLSEQESKQNLHLMLNIIRWLYSNQIPAS 
r| PNTPVPIHHSKNPSKLIMKPIHECCYCDIKVDDLNDLLEDSVEPIILVHEDIPMKTAE 

WLKVPCLS TRL INPENMGFEQSGQREPLTVR I KN ILEE YPS VSD I FKELLQNADDANA 

TECSFLIDMRRNMDIRENLLDPGMAACHGPALWSFNNSQFSDSDFVNITRLGESLKRG 

EVDKVGKFGLGFNSVYHITDIPIIMSREFMIMFDPNINHISKHIKDKSNPGIKINWSK 

QQKRLRKFPNQFKPFIDVFGCQLPLTVEAPYSYNGTLFRLSFRTQQEAKVSEVSSTCY 

NTADIYSLVDEFSLCGHRLIIFTQSVKSMYLKYLKIEETNPSLAQDTVIIKKKSCSSK 

ALNTPVLSVLKEAAKLMKTCSSSNKKLPSDEPKSSCILQITVEEFHHVFRRIADLQSP 

LFRGPDDDPAALFEMAKSGQSKKPSDELSQKTVECTTWLLCTCMDTGEALKFSLSESG 

RRLGLVPCGAVGVQLSEIQDQKWTVKPHIGEVFCYLPLRIKTGLPVHINGCFAVTSNR 

KEIWKTDTKGRWNTTFMRHVIVKAYLQVLSVLRDLATSGELMDYTYYAVWPDPDLVHD 

DFSVICQGFYEDIAHGKGKELTKVFSDGSTWVSMKNVRFLDDSILKRRDVGSAAFKIF 

LKYLKKTGSKNLCAVELPSSVKLGFEEAGCKQILLENTFSEKQFFSEVFFPNIQEIEA 

ELRDPLMIFVLNEKVDEFSGVLRVTPCIPCSLEGHPLVLPSRLIHPEGRVAKLFDIKD 

GRFP YGSTQD YLNP I IL I KL VQLGMAKDD ILWDDMLERAVS VAE INKSDHVAACLRS S 

ILLSLIDEKLKIRDPRAKDFAAKYQTIRFLPFLTKPAGFSLDWKGNSFKPETMFAATD 

LYTAEHQDIVCLLQPILNENSHSFRGCGSVSLAVKEFLGLLKKPTVDLVINQLKEVAK 

SVDDGITLYQENITNACYKYLHEALMQNEITKMSIIDKLKPFSFILVENAYVDSEKVS 

FHLNFEAAPYLYQLPNKYKNNFRELFETVGVRQSCTVEDFALVLESIDQERGTKQITE 

ENFQLCRRIISEGIWSLIREKKQEFCEKNYGKILLPDTNLMLLPAKSLCYNDCPWIKV 
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KDTTVKYCHADIPREVAVKLGAVPKRHKALERYASNVCFTTLGTEFGQKEKLTSRIKS 
ILNAYPSEKEMLKELLQNADDAKATEICFVFDPRQHPVDRIFDDKWAPLQGPALCVYN 
NQPFTEDDVRGIQNLGKGTKEGlSrPYKTGQYGIGFNSVYHITDCPSFISGRDILCIFDP 
HARYAPGATSISPGRMFRDLDADFRTQFSDVLDLYLGTHFKLDNCTMFRFPLRNAEMA 
KVSEISSVPASDRMVQNLLDKLRSDGAELLMFLNHMEKISICEIDKSTGALNVLYSVK 
GKI TDGDRLKRKQFHAS VIDS VTKKRQLKD I P VQQ I TYTMDTEDSEGNLTTWL I CNRS 
GFSSMEKVSKSVrSAHKNQDITLFPRGGVAACITHNYKKPHRAFCFLPLSLETGLPFH 
VNGHFALDSARRNLWRDDNGVGVRSDWNNSLMTALIAPAYVELLIQLKKRYFPGSDPT 
LSVLQNTPIHWKDTLKKFLSFFPVNRLDLQPDLYCLVKALYNCIHEDMKRLLPWRA 
PNIDGSDLHSAVIITWINMSTSNKTRPFFDNLLQDELQHLKNADYNITTRKTVAENVY 
RLKHLLLEIGFNLVYNCDETANLYHCLIDADIPVSYVTPADIRSFLMTFSSPDTNCHI 
GKLPCRLQQTMLKLFHSLKLLVDYCFKDAEENEIEVEGLPLLITLDSVLQTFDAKRPK 
FLTTYHELIPSRKDLFMNTLYLKYSNILLNCKVAKVFDISSFADLLSSVLPREYKTKS 
CTKWKDNFASESWLKNAWHFISESVSVKEDQEETKPTFDIWDTLKDWALLPGTKFTV 
SANQLWPEGDVLLPLSLMHIAVFPNAQSDKVFHALMKAGCIQLALNKICSKDSAFVP 
LLSCHTANIESPTSILKALHYIWQTSTFRAEKLVENDFEALLMYFlSrCNLNHLMSQDDI 
KI LKS LPCYKS I S GR YVS IGKFGTCYVLTKS I P S AE VEKWTQS S S S AFLEEKIHLKEL 

YEVIGCVPVDDLEVYLKHLLPKIENLSYDAKLEHLIYLKNRLSSAEELSEIKEQLFEK 
LESLLIIHDANSRLKQAKHFYDRTVRVFEVMLPEKLFIPNDFFKKLEQLIKPKNHVTF 
MTSWVEFLRNIGLKYILSQQQLLQFAKEISVRANTENWSKETLQNTVDILLHHIFQER 
^ MDLLS GNFLKELSL I PFLCPERAPAEF IRFHPQ YQEVNGTLPL IKFNGAQ WPKFKQC 
'^t DVLQLLWTSCPILPEKATPLSIKEQEGSDLGPQEQLEQVLIMLNWLDPPLDiCVINNC 

I I RNI CNI TTLDEEMVKTRAKVLRS I YE FLSAEKREFRFQLRGVAFVMVEDGWKLLKPEE 
WINLEYESDFKPYLYKLPLELGTFHQLFKHLGTEDIISTKQYVEVLSRIFKNSEGKQ 

^ LDPNEMRTVKRWSGLFRSLQNDSVKVRSDLENVRDLALYLPSQDGRLVKSSILVFDD 

'rl APHYKSRIQGNIGVQMLVDLSQCYLGKDHGFHTKLIMLFPQKLRPRLLSSILEEQLDE 
ETPKVCQFGALCSLQGRLQLLLSSEQFITGLIRIMKHENDNAFLANEEKAIRLCKALR 

1 EGLKVS CFEKLQTTLRVKGFNP I PHSRSETFAFLKRFGNAVILL YI QHSDS KD INFLL 

■--■^^ ALAMTLKSATDNLISDTSYLIAMLGCNDIYRIGEKLDSLGVKYDSSEPSKLELPMPGT 

f PIPAEIHYTLLMDPMNVFYPGEYVGYLVDAEGGDIYGSYQPTYTYAIIVQEVEREDAD 

^ NSSFLGKIYQIDIGYSEYKIVSSLDLYKFSRPEESSQSRDSAPSTPTSPTEFLTPGLR 

U SIPPLFSGRESHKTSSKHQSPKKLKVNSLPEILKEVTSWEQAWKLPESERKKIIRRL 

Ml YLKWHPDKNPENHDIANEVFKHLQNEINRLEKQAFLDQNADRASRRTFSTSASRFQSD 

□ KYSFQRFYTSWNQEATSHKSERQQQNKEKCPPSAGQTYSQRFFVPPTFKSVGNPVEAR 

III RWLRQARANFSAARJTOLHKWANEWVCFKCYLSTKLALIAADYAVRGKSDKDVKPTALA 
Q QKIEEYSQQLEGLTNDVHTLEAYGVDSLKTRYPDLLPFPQIPNDRFTSEVAMRVMECT 

AC I I I KLENFMQQKV " 

BASE COUNT 4163 a 2256 c 2487 g 3887 tORIGIN 

atgatttaca ggaagaccat gtactcagct gcagcttcta aatccagaac gatttgcacg 
tcttatcaag gaagtaatga atacattctg gcctggcaga gaattgattg ttcaatggta 
tccatttgat gaaaacagaa atcacccatc tgtttcatgg cttaagatgg tttggaaaaa 
tctttatata catttttcag aggatttgac tttatttgat gagatgccac ttatccccag 
aactatacta gaggaaggtc agacatgtgt ggaactcatt agactcagga ttccatcgtt 
agtcatttta gacgatgaat ctgaagcaca gcttccagaa tttttagcag acattgtaca 
aaaacttgga gggtttgtcc ttaaaaaatt agatgcatct atacaacatc cgcttattaa 
aaaatatatt cattcaccat taccaagtgc tgttttgcag ataatggaga agatgccatt 
gcagaaattg tgtaatcaaa taacttcgct acttccaaca cacaaagatg ccctgaggaa 
gttcttggct agtttaaccg atagcagtga gaaagagaaa agaattattc aagaattggc 
aatattcaag cgcattaacc attcttctga tcagggaatt tcctcttata caaaattgaa 
aggttgtaaa gtcttacacc atactgccaa actcccagca gatctgcgac tttctatttc 
agtaatagac agtagtgatg aagctactat tcgtctggca aacatgttga aaatagaaca 
gttaaagacc actagctgct taaagcttgt tttaaaagat attgaaaatg cattttattc 
acatgaagag gtaacacagc ttatgttatg ggtccttgag aatctatctt ctcttaaaaa 
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tgagaatcca aatgtgcttg agtggttaac accattaaaa ttcatccaga tatcacagga 
acagatggta tcagctggtg aactctttga ccctgatata gaagtactaa aggatctctt 
ttgtaatgaa gaaggaacct atttcccacc ctcagttttt acctcaccag atattcttca 
ctccttaaga cagattggtt taaaaaacga agccagtctc aaagaaaagg atgttgtgca 
agtggcaaaa aaaattgaag ccttacaggt cggtgcttgt cctgatcaag atgttcttct 
gaagaaagcc aaaaccctct tactggtttt aaataagaat cacacactgt tgcaatcatc 
tgaaggaaag atgacattga agaaaataaa atgggttcca gcctgcaagg aaaggcctcc 
aaattatcca ggctctttgg tctggaaagg agatctctgt aatctctgtg caccaccaga 
tatgtgtgat gtaggccatg caattctcat tggctcctca cttcctcttg ttgaaagtat 
ccatgtaaac ctggaaaaag cattagggat cttcacaaaa cctagcctta gtgctgtctt 
aaaacacttt aaaattgttg ttgattggta ttcttcaaaa acctttagtg atgaagacta 
ctatcaattc cagcatattt tgcttgagat ttacggattc atgcatgatc atctaaatga 
agggaaagat tcttttagag ccttaaaatt tccatgggtt tggactggca aaaagttttg 
tccacttgcc caggctgtga ttaaaccaat ccatgatctt gaccttcagc cttatttgca 
taatgtacct aaaaccatgg caaaattcca ccaactattt aaggtctgtg gttcaataga 
ggagttgaca tcagatcata tttccatggt tattcagaag atatatctca aaagtgacca 
agatctcagt gaacaagaaa gcaaacaaaa tcttcatctt atgttgaata ttatcagatg 
gctgtatagc aatcagattc cagcaagccc caacacacca gttcctatac atcatagcaa 
aaatccttct aaacttatca tgaagccaat tcacgaatgc tgttattgtg acattaaagt 
tgatgacctt aatgacttac ttgaagattc tgtggaacca atcattttgg tgcatgagga 
.==1 catacccatg aaaactgcag aatggctaaa agttccatgc cttagtacaa gactgataaa 
tcctgaaaac atgggatttg agcagtcagg acaaagagag ccacttactg taagaattaa 
g5 aaatattctg gaagaatacc cttcagtgtc agatattttt aaagaactac ttcaaaacgc 
j| tgatgatgca aatgcaacag aatgcagttt cttgattgat atgagaagaa atatggacat 
aagagagaat ctcctagacc cagggatggc agcttgtcat ggacctgctt tgtggtcatt 
caacaattct caattctcag attcagattt tgtgaacata actaggttag gagaatcttt 
aaaaagggga gaagttgaca aagttggaaa atttggtctt ggatttaatt ctgtgtacca 
'rZ tatcactgac attcccatca ttatgagtcg ggaattcatg ataatgttcg atccaaacat 
aaatcatatc agtaaacaca ttaaagacaa atccaatcct gggatcaaaa ttaattggag 
'■^^ taaacaacag aaaagactta gaaaatttcc taatcagttc aaaccattta tagatgtatt 
^ tggctgtcag ttacctttga ctgtagaagc accttacagc tataatggaa cccttttccg 
actgtccttt agaactcaac aggaagcaaa agtgagtgaa gttagtagta cgtgctacaa 
IJj tacagcagat atttattctc ttgtggatga atttagtctc tgtggacaca ggcttatcat 
U tttcactcag agtgtaaagt caatgtattt gaagtacttg aaaattgagg aaaccaaccc 
U= cagtttagca caagatacag taataattaa aaaaaaatcc tgctcttcca aagcattgaa 
ill cacacctgtc ttaagtgttt taaaagaggc tgctaagctc atgaagactt gcagcagcag 
taataaaaag cttcccagtg atgaaccaaa gtcatcttgc attcttcaga tcacagtgga 
agaatttcac catgtgttca gaaggattgc tgatttacag tcgccacttt ttagaggtcc 
agatgatgac ccagctgctc tctttgaaat ggctaagtct ggccaatcaa aaaagccatc 
agatgagttg tcacagaaaa cagtagagtg taccacgtgg cttctgtgta cttgcatgga 
cacaggagag gctctgaagt tttccctgag tgagagtgga agaagactag gactggttcc 
atgtggggca gtaggagttc agctgtcaga aatccaggac cagaagtgga cagtgaaacc 
acacattgga gaggtgtttt gctatttacc tttacgaata aaaacaggct tgccagttca 
tatcaatggg tgctttgctg ttacatcaaa taggaaagaa atctggaaaa cagatacaaa 
aggacgatgg aataccacgt tcatgagaca tgttattgtg aaagcttact tacaggtact 
gagtgtctta cgggacctgg ccactagtgg ggagctaatg gattatactt actatgcagt 
"atggcccgat cctgatttag ttcatgatga tttttctgta atttgccaag gattttatga 
agatatagct catggaaaag ggaaagaact gaccaaagtc ttctctgatg gatctacttg 
ggtttccatg aagaacgtaa gatttctaga tgactctata cttaaaagaa gagatgttgg 
ttcagcagcc ttcaagatat ttttgaaata cctcaagaag actgggtcca aaaacctttg 
tgctgttgaa cttccttctt cggtaaaatt aggatttgaa gaagctggct gcaaacagat 
actacttgaa aacacatttt cagagaaaca gtttttttct gaagtgtttt ttccaaatat 
tcaagaaatt gaagcagaac ttagagatcc tttaatgatc tttgttctaa atgaaaaagt 
tgatgagttc tcgggagttc ttcgtgttac tccatgtatt ccttgttcct tggaggggca 
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tcctttggtt ttgccatcaa gattgatcca ccccgaagga cgagttgcaa agttatttga 
tattaaagat gggagattcc cttatggttc tactcaggat tatctcaatc ctattatttt 
gattaaacta gttcagttag gtatggcaaa agatgatatt ttatgggatg atatgctaga 
acgtgcagtg tcagtagctg aaattaataa aagtgatcat gttgctgcat gcctaagaag 
tagtatctta ttgagtctta tcgatgagaa actaaaaata agggatccta gagcaaagga 
ttttgctgca aaatatcaaa caatccgctt ccttccattt ctgacaaaac cagcaggttt 
ttctttggac tggaaaggca acagttttaa gcctgaaacc atgtttgcag caactgacct 
ttatacagct gaacatcaag atatagtttg tcttttgcaa ccaattctaa atgaaaattc 
ccattctttt agaggttgtg gttcagtgtc attggctgtt aaagagtttt tgggattact 
caagaagcca acagttgatc tggttataaa ccaattgaaa gaagtagcaa aatcagttga 
tgatggaatt acactgtacc aggagaatat caccaatgct tgctacaaat accttcatga 
agccttgatg caaaatgaaa tcactaagat gtcaattatt gataagttaa aaccctttag 
cttcattcta gttgagaatg catatgttga ctcagaaaag gtttcttttc atttaaattt 
tgaggcggca ccataccttt atcagttgcc taataagtat aaaaataatt tccgcgaact 
ttttgaaacc gtgggtgtga ggcagtcatg cactgttgaa gattttgctc ttgttttgga 
atctattgat caagaaagag gaacaaagca aataacagaa gagaattttc agctttgccg 
acgaataatc agtgaaggaa tatggagtct cattagagaa aagaaacaag aattttgtga 
gaaaaattat ggcaagatat tattgccaga tactaatctt atgcttctcc ctgctaaatc 
gttatgctac aatgattgcc cttggataaa agtaaaggat accactgtaa aatattgtca 
tgctgacata cccagggaag tagcagtaaa actaggagca gtcccaaagc gacacaaagc 
cttagaaaga tatgcatcca atgtctgttt tacaacactt ggcacagaat ttgggcagaa 
agaaaaattg accagcagaa ttaagagcat ccttaatgca tatccttctg aaaaggaaat 
gttgaaagag cttcttcaaa atgctgatga tgcaaaggcg acagaaatct gttttgtgtt 
tgatcctaga cagcatccag ttgatagaat atttgatgat aagtgggccc cattgcaagg 
gccagcactt tgtgtgtaca acaaccagcc atttacagaa gatgatgtta gaggaattca 
gaatcttgga aaaggcacga aagagggaaa tccttataaa actggacagt atggaatagg 
attcaattct gtgtatcata tcacagactg cccatctttt atttctggca atgacatcct 
gtgtattttt gatcctcatg ccagatatgc accaggggcc acatccatta gtcccggacg 
catgtttaga gatttggatg cagattttag gacacagttc tcagatgttc tggatcttta 
tctgggaacc cattttaaac tggataattg cacaatgttc agatttcctc ttcgtaatgc 
agaaatggca aaagtttcgg aaatttcgtc tgttccagca tcagacagaa tggtccagaa 
tcttttggac aaactgcgct cagatggggc agaacttcta atgtttctta atcacatgga 
aaaaatttct atttgtgaaa tagataagag tactggagct ctaaatgtgc tgtattcagt 
aaagggcaaa atcacagatg gagacagatt gaaaaggaaa caatttcatg catctgtaat 
tgatagtgtt actaaaaaga ggcagctcaa agacatacca gttcaacaaa taacctatac 
tatggatact gaggactctg aaggaaatct tactacgtgg ctaatttgta atagatcagg 
cttttcaagt atggagaaag tatctaaaag tgtcatatca gctcacaaga accaagatat 
tactcttttc ccacgtggtg gagtagctgc ctgcattact cacaactata aaaaacccca 
tagggccttc tgttttttgc ctctttcttt ggagactggg ctgccatttc atgtgaatgg 
ccactttgca ctggattcag ccagaaggaa cctgtggcgt gatgataatg gagttggtgt 
tcgaagtgac tggaataaca gtttaatgac agcattaata gctcctgcat atgttgaatt 
gctaatacag ttaaaaaaac ggtatttccc tggttctgat ccaacattat cagtgttaca 
gaacacccct attcatgttg taaaggacac tttaaagaag tttttatcgt ttttcccagt 
taaccgtctt gatctacagc cagatttata ttgtctagtg aaagcacttt acaattgcat 
tcacgaagac atgaaacgtc ttttacctgt tgtgcgggct ccaaatattg atggctctga 
cttgcactct gcagttataa ttacttggat caatatgtct acttctaata aaactagacc 
attttttgac aatttactac aggatgaatt acaacacctt aaaaatgcag attataatat 
caccacacgc aaaacagtag cagagaatgt ctataggctg aaacatctcc ttttagaaat 
tggtttcaac ttggtttata actgtgatga aactgctaat ctttaccact gtcttataga 
tgcagatatt cctgttagtt atgtgacccc tgctgatatc agatcttttt taatgacatt 
ttcctctcct gacactaatt gccatattgg gaagctgcct tgtcgtctgc agcagactaa 
tctaaaactt tttcatagtt taaaactttt agttgattat tgttttaaag atgcagaaga 
aaatgagatt gaagttgagg gattgcccct tctcatcaca ctggacagtg ttttgcaaac 
ttttgatgca aaacgaccca agtttctaac aacatatcat gaattgattc catcccgcaa 
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agacttgttt atgaatacat tatatttgaa 
tgcaaaagtg tttgacattt ccagctttgc 
atataagacc aaaagttgca caaagtggaa 
gaatgcatgg cattttatta gtgaatctgt 
accaacattt gacattgttg ttgatactct 
gtttactgtt tcagccaacc agcttgtggt 
ccttatgcac attgcagttt ttccaaatgc 
gaaagccggc tgtattcagc ttgctttgaa 
tcctttgttg tcatgtcaca cagcaaatat 
acattatatg gtccaaactt caacatttag 
ggcacttttg atgtatttca actgcaattt 
aattctaaag tcacttccgt gctataaatc 
atttggaaca tgctacgtac ttacaaaaag 
acaatcatca tcatctgcat ttcttgaaga 
gattggttgt gtacctgtag atgatcttga 
tgaaaatctc tcttatgatg caaaattaga 
aagtgctgag gaattatcag agattaagga 
gataatccat gatgctaaca gtagactaaa 
gagagttttt gaagttatgc ttcctgaaaa 
attggaacaa cttataaaac ccaaaaatca 
cttaagaaat attggactaa aatacatact 
'"'li ggaaatcagt gtgagggcta atacagaaaa 
tgatatcctt ctgcatcata tattccaaga 
y= gaaagaacta tctttaatac cattcttatg 
'il atttcatcct caatatcaag aggtaaatgg 
acaggtaaat ccaaaattca agcaatgtga 
iy tattcttcca gagaaagcta cacccttaag 
L-l tccacaagaa cagcttgaac aagttttaaa 
yi tgataaggta atcaataact gcagaaacat 
■r, ggtaaaaact agagcaaaag tcttaaggag 
ggaatttcgt tttcagttgc gaggggttgc 
Q tctgaagcct gaggaggtag tcataaacct 
ill gtacaagcta cctttagaac ttggcacatt 
f== agatattatt tcaactaagc aatatgttga 
f\ gggcaaacaa ttagatccta atgaaatgcg 
r% caggagtcta cagaatgatt cagtcaaggt 
tgcgctttac ctcccaagcc aggatggtag 
cgatgcgcca cattataaaa gtagaatcca 
tctcagccag tgctacttag ggaaagacca 
tcctcaaaaa cttagacctc gattattgag 
gactcccaaa gtttgtcagt ttggagcgtt 
cttgtcttct gaacagttca ttacaggact 
tgcttttctg gccaatgaag aaaaagccat 
■ gaaagtatcc tgctttgaaa agcttcaaac 
tccccacagc agaagtgaaa cttttgcttt 
gctctacatt caacattcag acagtaaaga 
tcttaaatca gcaactgaca atttgatttc 
atgcaatgat atttacagga ttggtgagaa 
ttcggagcca tcaaaactgg aacttccaat 
ttacactctg cttatggacc caatgaatgt 
tgttgatgct gaaggtggtg atatctatgg 
tattgtacaa gaagttgaaa gagaagatgc 
tcagatagat attggttata gtgaatataa 
ttcaagacct gaggaaagct ctcaaagcag 



atatagtaat attttattga actgtaaagt 
tgatttgtta tcctctgtgt tgcctcgaga 
agacaatttt gcaagtgagt cttggcttaa 
aagtgtgaaa gaagatcagg aagaaacaaa 
aaaagactgg gcattgcttc caggaacaaa 
tcctgaagga gatgttctgc ttcctctcag 
ccagagtgat aaagtttttc atgctctaat 
caaaatctgt tccaaagaca gtgcatttgt 
agagagcccc acaagcatct tgaaggctct 
agcagaaaaa ttagtagaaa atgattttga 
gaatcatttg atgtcccaag atgatataaa 
catcagtggc cgctatgtaa gcattggaaa 
tatcccttca gctgaagtgg agaaatggac 
aaaaatacac ttaaaagaac tatatgaggt 
ggtatatttg aaacacctct taccaaaaat 
gcacttgatc taccttaaga atagattatc 
acaacttttt gaaaaactgg aaagtttatt 
gcaagcaaag catttctatg atagaactgt 
attgtttatt cctaatgatt tctttaagaa 
tgttacattt atgacatcct gggtggaatt 
ttctcagcag cagttgttac agtttgctaa 
ctggtccaaa gaaacattgc aaaatacagt 
acgaatggat ttgttatctg gaaattttct 
tcctgagcgg gcccccgcgg aattcattag 
aacacttcct cttataaagt tcaatggagc 
tgtactccag ctgttatgga catcctgccc 
cattaaagaa caagaaggta gtgaccttgg 
tatgcttaat gttaacctgg atcctcctct 
atgcaacata acgacgttgg atgaagaaat 
catatatgaa ttcctcagtg cagaaaaaag 
ttttgtgatg gtagaagatg gttggaaact 
agaatatgaa tctgatttta aaccttattt 
tcaccagttg ttcaaacact taggtactga 
agtgttgagc cgcatattta aaaattctga 
tacagttaag agagtagttt ctggtctgtt 
gaggagtgat ctcgagaatg tacgagacct 
attggtaaag tcaagcatct tagtgtttga 
ggggaatatt ggtgtgcaaa tgttagttga 
tggatttcac actaagttga taatgctctt 
cagtatactt gaagaacaat tagatgaaga 
gtgttctctt caaggaagat tgcagttact 
gattagaatt atgaagcatg aaaatgataa 
aagactttgc aaagccctaa gagaaggatt 
aacattaaga gttaaaggtt ttaatcctat 
tttgaagcga tttggtaatg cagtcatctt 
cattaatttc ctgttagcac tggcaatgac 
tgacacttca tatttaattg ctatgctagg 
acttgacagt ttaggagtga aatatgactc 
gcctggcaca ccaattcctg ctgaaattca 
tttttacccg ggagaatatg ttgggtacct 
atcataccag ccaacataca catatgcaat 
tgacaattct agttttctag gaaagatata 
aatagttagc tctcttgatc tgtataagtt 
ggacagtgct ccttctacac caaccagccc 
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cactgagttc ctcacccctg gcctgagaag 
ccacaagact tcttccaaac atcagtcccc 
aatcttaaaa gaagtgacat ctgtggtgga 
aaaaaagatt attaggcggt tgtatttgaa 
tgacattgcc aatgaagttt ttaaacattt 
ggcttttcta gatcaaaatg cagacagggc 
ccgatttcag tcagacaaat actcatttca 
aacgagccat aaatctgaaa gacagcaaca 
acagacttac tctcaaaggt tctttgttcc 
ggaagcacgc agatggctaa gacaagccag 
tcataaaaat gccaatgagt gggtgtgctt 
gattgcagct gactatgctg tgaggggaaa 
tgctcagaaa atagaggaat atagtcagca 
attggaagct tatggtgtag acagtttaaa 
tcagatccca aatgacaggt tcacttctga 
ctgtatcata ataaaacttg aaaattttat 
aaaaaaggta gatcttgaat gtgttgtagc 
cattgccaat tagctaggaa ttgttaagca 
gttgttatga acatgaatac caacggaaaa 
gaagatggtg gtgagctgca aaatagctgg 
ttgaactgca ctttatataa ccaaagctta 
ctctggttag gatgaagtta attttatgtt 
aaacactgga catataattg gtttaaacat 
ttttttaagt ggatcctctt ggatgcgtta 
ttgttgtaat tttatgttgt actcagtgca 
ttacttgact aagagtgtga aggtagtact 
tttatttaaa ttttttttta acatcttatg 
aacggaaaac tcaaaatggt ggcagttctt 
ctgcttgcca agacaacatt tattaactgt 
tattttccac aaatgttata atttatatag 
tctaaaggtg ctgcagttaa aaaaaaaaca 
gtttttttaa ctttaaaaac atcaaaaatt 
taattatcgg cttatatttc cccatgaatg 
tcgccatgct tctttacttt aacatatttc 
agtttatata agtgtactgg ctgtaaatga 
tacagaacat gttgaaactt tttttacttt 
cattttattg ctt 



cattcctcct cttttctctg gtagagagag 
caaaaagctt aaggttaatt ctttaccaga 
gcaagcatgg aagcttccag aatcggaacg 
atggcatcct gacaaaaatc cagagaacca 
gcagaatgaa atcaacagat tagaaaaaca 
ctccagacga acattttcaa cctcagcatc 
gagattctat acttcatgga atcaagaagc 
gaacaaagaa aaatgccccc cttcagccgg 
tcccactttc aagtcggttg gcaatccagt 
agcaaacttc tcagctgcca ggaatgacct 
taaatgttac ctttctacca agttagcttt 
gtctgataaa gatgtaaaac caactgcact 
acttgaagga ctgacaaatg atgttcacac 
aacaagatac cctgatttgc ttccctttcc 
ggttgctatg agggtgatgg aatgtactgc 
gcaacaaaaa gtgtgaagat atttaacgaa 
acgaataaat tgctgtactt cattaagctt 
cattgcagat tgttcttgga gaattctgga 
ccttaactga atctaaaaga aaactatttt 
atggatttga atgattggga tgatacatca 
gcagtttgtt agataagagt ctatgtatgt 
tttaacatgg tatttttgaa ggagctaatg 
^a-ggggaatt aagtctttgt agtctgtcat 
ttttctcatc agctggctct gatcatgaat 
tttaagaaat ggtagagtat tttaatccta 
ttttagagtg cactgagtgc actttacatc 
tttacaggct tcctgtttga tgaagatagc 
attaccagtt gttagtattg tttctggaaa 
tagaacactt gctttatgtt tgtgtgtaca 
tgtggttgaa caggatgcaa tcttttgttg 
accttttctt tcaatatggc atgtagtgga 
gttaaaatca ttgtgttatc tagtagttta 
atcagaactg acatttaatt catgtttgtc 
ttttgcagaa tgtaaaaggt aatgataatt 
tgctaaatat actttatgca attaagggct 
tattgggaat aaggaatgtt tgcacctcca 



Figure 9F 



